Binaural Audio for Narrative VR

Oculus Story Studio Blog
Posted by Thomas Bible
May 31, 2016

George Lucas once said that audio is 50% of the experience. This has never been truer than it is with VR, where we hijack your perception and transport you to new worlds.

With VR, we use our knowledge of optics to simulate visual perception to create an immersive experience. But we can’t have a completely immersive experience without high quality audio that simulates how hearing works as well. VR audio gives us a number of powerful tools to achieve this, like being able to direct the viewer’s attention, enhance user feedback, and drive emotions.

Simulating a full environment often creates an overwhelming number of things to experience, but well-crafted audio directs our attention to the most important elements in a scene—including those outside our field of vision. Audio also supplies us with important feedback that turns an ordinary physics interaction into something truly compelling and immersive. Most importantly, audio drives our emotions much more effectively than anything else because hearing is our fastest sense. Try watching a scene from a movie with the sound off to see what we mean.

So what is binaural audio?

Binaural audio was first used around 1881, but has remained a novelty because it required the listener to keep their their head stationary to be fully experienced—that is, until now. By simulating the way our hearing works, while also tracking head movement, we can create an audio experience that sounds “real”, with the freedom to move around and explore a virtual acoustic space.

To do this, binaural audio simulates various elements of acoustic physics such as: the inter-aural time difference (the difference in the time it takes a sound to travel from the source to one ear, compared to the other), environmental reflections (most of the sound we hear does not come to us directly from the source, but rather as reflections from walls, floors, and other surfaces), filtering of sounds by your body (the shape of our ears and bodies affects the timbre of sounds we hear. Try covering the tops of your ears with your hands to see what we mean), and many other techniques.

The binaural Oculus Spatializer plug-in, available as a free download.

For Henry and Dear Angelica, we used the binaural Oculus Spatializer, a free plug-in available on the Oculus developer site. This plug-in does much of the complex acoustic simulation work, allowing us to focus on the more creative side of binaural VR audio.

Binaural vs. non-binaural audio

It’s fairly easy to create audio for VR with traditional techniques, which provide more control over how the sound is played back—for example using stereo sounds with no real-time binaural processing. But even if those stereo sounds are binaurally pre-processed, the sense of immersion can be easily broken. This is because when you turn your head, all the sounds turn with you, which can immediately break the sense of presence in a virtual space.

We found that there is a spectrum for creating sound in VR—with completely binaural sounds on one end (100% acoustic accuracy), and sounds playing back exactly as designed on the other (100% control). The binaural end of the spectrum represents a loss of control over the mix and timbre of the audio, while the more controlled end is immersion-breaking to the listener.

Film and games tend to live at the most controlled end of this spectrum. But in order to be truly immersive, VR audio needs to embrace the chaos and live on the binaural side. With binaural audio, we can develop the tools and language the medium needs, solve the issues it brings up, and take advantage of the greater immersion it creates.

Successful sound design

Creating sounds using binaural acoustic modelling adds a few creative restrictions. In previous games we’ve worked on, each character plays their audio from a single source, usually in the center of the character. Binaural audio lets us hear exactly where sounds are coming from. Now, with each character, we have to think about how many total sources we need and where to place them. Henry had 3 mono sources: one for his mouth (his voice), one for his torso (his hands), and one for his feet (his footsteps). Also, all these sounds have realistic fall-off curves that model loudness with distance in the real world.

One key to acoustically immersing someone in a space is understanding the way the space itself affects sound. Since most of the audio in the real world comes to you as reflections off walls, floors, and other surfaces, we couldn’t bake reverb into our sources (as we would in a game), since you’d lose the sense of the sound bouncing off surfaces around you. However, there was one place where including reverb in sounds worked well: when Henry is in the kitchen at the beginning of the experience. This worked because all the sounds come to you through the kitchen door, which is bit like a point source.

Henry in the kitchen – All the sound from the kitchen comes to the listener through the doorway, allowing us to make all the Kitchen audio into a single mono source.

How we mix

Losing control over attenuations and other elements of audio makes mixing in VR tricky. For Henry we chose to mix at Skywalker Sound to take advantage of their linear mixing expertise. However, because we were going to convert this mix to be played back in an interactive environment, it meant we had to create our “guide mix” on the big screen in mono, without panning or distance attenuation. So we mixed roughly 65 mono stems as best we could, and then placed them in the Henry experience to test, using an iterative approach. This was helped by the fact that we mixed with only one attenuation curve, since all the sounds had similar “real world” loudness (such as footsteps, plates falling, and doors opening). Having just one attenuation curve meant that we could mix using gain levels on the tracks, and we established expectations as to how far a sound was from the listener.

Mixing Henry at Skywalker Sound, with Mixer Steve Boedekker (left) and Director Ramiro Lopez Dau (right)

Similarly, Dear Angelica contains a few sounds that are much louder than normal sounds—like an explosion—so we used three attenuation curves for that project. This establishes “loudness categories” to help listeners understand the loudness of a sound compared to its distance away.

When mixing, it’s also worth noting that just as too much sound can be overwhelming to a listener, too much spatial sound can also be overwhelming. In Henry, we addressed this in busy moments by collapsing sounds that were far away from the listener (but close to each other) down to single sources. This approach also saves a bit of CPU time!

What about music?

Music in VR presents a real challenge since it doesn’t live naturally in the world like other sounds do. We tried a number of ways to make music fit, with varying results:

For the main Henry experience, we went with 2-D music, played straight to your headphones with no binaural processing. This actually worked fairly well, since most people are used to hearing music on headphones while also hearing sounds from the outside world. The trouble with this is that the binaural sound effects change dramatically in volume as you move (by more than 3dB over one meter), meaning that it’s not possible to have the music perfectly balanced with the SFX mix.

For the Henry trailer, we tried something different: We broke the music apart into stems, one for each instrument section, and placed them around you in an arc. The only reason we were able to do this was because Henry was in an unlit black room, so having floating “phantom sources” playing music at you—without the sounds coming from something—wasn’t an issue. When we tried the same thing with the room visible, the phantom source issue became much more noticeable and jarring. Clearly, context is everything.

The Henry Trailer—Not being able to see the environment meant we were able to place audio wherever we wanted without it being jarring to listeners.

The other problem with placing music sources around you is that you can walk right up to one of the sources, unbalancing the mix. This can easily be solved by having the sources move with your head (aka “headlocked”). However, in some contexts, this will make the phantom source issue more apparent.

Despite the 135 years since binaural audio’s invention, we’re still in the early days of figuring out how to create authentic real-time immersive audio with the kind of control that we’re used to in other mediums. As we share and build on what we learn, we’ll make these kinds of exciting technical leaps happen faster and with greater impact.