Musical Collaboration Online

In February 2020 the music community, like many others, was thrown off its feet by the pandemic. As the weeks and months of no live gigs and no in-person collaboration rolled on, the community sought out ways of jamming, practicing, recording and collaborating online. As we cautiously emerge from the pandemic, many of the tools we used have become part of our normal processes.

Latency - the 800 pound gorilla in the room

If only playing in realtime online were as simple as firing up a Zoom session, but it's not. Round-trip latency, or the amount of time it takes a sound to travel out of your instrument or microphone, into your computer, over the internet to your collaborators, and back, is the enemy. It is the dreaded echo - you play your note locally, then hear it a moment later through your headphones with an annoying delay.

The discussion for the next few sections applies to all collaboration software. Optimizing your hardware and software, and understanding the limitations of sound will be more important than the software you choose.

Before going further you should be familiar with the three tools we use both in the home studio for recording, as well as for collaborating online: the recording interface, the computer, and the Digital Audio Workstation (DAW) software. If you need to learn more, please read through the first three sections of the home studio recording gear page on this website.

Our brains can cope with around 10-25 milliseconds (thousandths of a second) of "one-way" latency without a problem. A 25 millisecond delay is what you hear live when you are approximately 25 feet from the monitor wedge. Above 40 becomes unworkable. But that's just half the picture - if you are collaborating with another musician online the total round-trip latency you experience between any two collaborators is the sum of the latency that each of the pairs of musicians hears.

Latency comes from several sources.

A Suboptimal Audio Buffer Size: Buffer size is the number of samples that your computer bites off at a time to process. You adjust this either in the software that drives your recording interface, or if your online collaboration software communicates with the interface, you can change it from there. This table to the left is an example of how buffer size and sample rate affects latency for a standard USB recording interface. Your mileage may vary. Thunderbolt interfaces are much faster than USB interfaces because the signal moves more quickly between your interface and your computer) and latency also varies by manufacturer. Your interface software will tell you the actual latency contribution when you change the buffer size,

From the chart it might seem like a no brainer, just set the sample size to 32 samples and go. But the smaller the sample size, the harder your computer needs to work (smaller bites of data repeated more often). If that harder work causes the computer to fall behind in its processing work it can contribute to jitter. Jitter (missing or mis-ordered data packets) results in unpleasant digital noise often described as crackling.

Your collaboration software can be set to try to correct for jitter, but that requires the computer to slow things down to complete the processing, and that results in latency. So it's a tradeoff. Jitter vs. Latency.

I would recommend using 48kHz as your sample rate, and then starting at a buffer size of 256 samples. Then try lowering it to 128. If you don't hear crackling try lowering it to 64, and possibly even 32 samples. But you may find that your limit is 256 or 128, so the contribution to latency from your computer will be 2.5-5 ms (twice that for both you and your collaborator).
Your internet connection: First things first - WiFi is a no go. So are Bluetooth headphones. You will need to use wired headphones plugged into the headphone jack of your interface and connect your interface with an ethernet cable either directly to your router or to a switch that is connected to your router via ethernet cable. If you have a wireless mesh system, plugging your interface into the ethernet port of a wireless mesh point will generally not do the trick. A fiber gigabit (1 gigabit per second) connection or faster is best. Anything below 200 megabits will be too slow.

Slower or lower quality internet connections can also lead to jitter from missing data packets or packets that arrive in the wrong order. As with the buffer size discussion, your software has two ways of dealing with this. (a) Keep plowing ahead and ignore the missing or mis-ordered packets that are causing the digital noise, or (b) take a time-out to find the missing packets or re-order packets so that they can be played properly and thereby induce latency on purpose (auto jitter adjustment). Again - a tradeoff between latency and digital noise. You can instruct the software the approach you would like to take using the auto-jitter adjustment, which will deal with jitter coming from all sources.
Your computer: Your audio drivers will contribute to latency. Be sure to use an Asio driver such as the free Asio4All driver, not the standard Windows Audio driver, if you are using a Windows PC. The more powerful the computer, the better it will be able to handle the load before resulting in jitter.

Another tradeoff is the degree of compression used on the audio being passed between collaborators. Passing larger uncompressed audio will negatively impact users with narrower internet bandwidths, but smaller compressed files will put more load on the computer to compress and decompress the audio in real time. If you are fortunate enough to have gigabit service, passing uncompressed audio is your best option. A user with slower internet and a more powerful computer might be better off with compressed files.
Distance: Here's the real show stopper. We can't beat the laws of physics - distance matters. At best your audio data packets will travel at the speed of light (186,282 miles/second) in a straight line from you to your collaborators, but that is unfortunately never the case. Your data may zig and zag, and it may travel more slowly through a crowded network switch. Even if conditions were perfect it would take your packet 0.013 seconds (13 ms) to travel 2,451 miles from NYC to LA as the crow flies, and another 13 ms to return, or 26 ms round-trip. In real life it is closer to twice that, or 50+ ms. So if we want to stay under 25 ms round trip between all of the contributors to latency, we get our best results when all of our collaborators are within 500 miles of each other, and the closer they are the better the results are.

So two collaborators 500 miles apart with gigabit service might have
- 5 ms round-trip from distance (using 1/2 the crow's flight speed of light)
- 8 ms round-trip computer processing time
- 8 ms round-trip from the need for the computers to correct for jitter

Which gets us to a workable 21 ms round-trip in theoretical latency, right about our target. In practice latency varies from session to session, and even during a session. Latency isn't just annoying, it often leads to something referred to as the "toilet bowl effect." Especially with uptempo tunes, the drummer in NY might perceive that the bass player in Chicago is playing behind the beat because of latency. She has two choices - either continue to play at a consistent tempo, which sounds like she is playing ahead of her bass player, or more instinctively trust her ears and slow slightly, perhaps unconsciously.

The first is the correct choice for the sake of the session, but can lead to a far less musically enjoyable experience for the drummer. The second will sound better, but then the bass player hears the drummer slowing, and he slows. And then the drummer hears the bass player slowing ... Thus, the toilet bowl effect.

Sonobus

Sonobus, which was created by the talented (and very friendly) Jesse Chappell, is the cleanest, most feature rich, and highest fidelity collaboration software that I have used. Oh, and it's free (but donations to the developer are welcomed). It is available for Mac, Windows, iOS, and GNU/Linux.

Sonobus checks all the boxes for collaboration software.

Live jamming in a private room. Sonobus produces the most musical experience I have had so far for getting my band together virtually just to play. Latency is as low as I have seen, and there is full flexibility to tweak settings to further minimize it. It goes at least three extra steps that most competitors don't.
- The intuitive mixer board has really high quality integrated compression and EQ, so you can dial in your sound and save it.
- The software comes with VST2, VST3, and AU plugins that you can drop into the Main Out of your DAW to put Sonobus inside your DAW. Simply double click on the plugin and the Sonobus screen appears inside your DAW. This has at several advantages.
  - I can use the same effects plugins and settings that I use when recording and mixing when I am collaborating online. Huge time savings - I don't have to replicate all my Compression and EQ settings in the Sonobus mixer.
  - Electric bass or electric guitar players have the option of using an amp simulator inside their DAW for the collaboration session in Sonobus.
  - If I am running a Sonobus session with multiple players locally I can feed one (mono or stereo) bus for each player to each Sonobus channel in the mixer rather than sending all of the individual channels.
- Extremely flexible jitter buffer adjustments
Recording. With one click you can record the entire session with many options for how you would like the tracks to be saved. The results I have gotten from Sonobus have been stunning, and it is simple to export the tracks and drop them into my DAW for further editing.
Sitting in/meeting musicians. Sonobus does have a public room mode, but the community that uses Sonobus seems to be oriented to private rooms so far. JamKazaam (below) is used more consistently for "open rooms." but the quality is lower, in my opinion.

Sonobus doesn't integrate a talkback mic from your computer mic like JamKazam, but you can easily plug a mic into your interface (if you have an open mic input) and mix it in. It doesn't have integrated video like JamKazam but you can always use a separate device to Zoom alongside Sonobus if video is important to you.

With three to four players that live <30 miles from each other and all three with gigabit internet we have been getting 18-20 ms round trip latency. Michael Eskin has produced a series of YouTube videos which are very helpful, including this well thought out guide to latency management in Sonobus. Kudos to Jesse. Highly recommended.

Other software alternatives

JamKazam is similar to Sonobus. It is a paid service (at most service levels) and does integrate video and a talkback mic, which might be helpful for some users. (No need to have a Zoom account, you can use your computer mic as your talkback mic, so it could save you a channel on your interface.) It also has a much more active public room group, so if you are looking to sit in with strangers, it is a good bet.

But I found the audio quality and technical finish of the app not to be at as high a standard as Sonobus, so for my purposes I prefer Sonobus.

Jamulus is an open source collaboration application that might also be of interest, but I found it to be less useful than either Sonobus or Jamkazam.

The JackAudio Kit may be of use to those who are more technically inclined.