They say that seeing is believing, but at OC5, Oculus Chief Scientist Michael Abrash made the case that accurate sound rendering plays a critical role in creating believable VR experiences—and a key part of sound rendering is simulation of the environment and its acoustics. In their Oculus Connect talk entitled “Spatial Audio for Oculus Quest and Beyond,” Audio Design Manager Tom Smurdon and Software Engineer Pete Stirling discussed how to create high-fidelity audio experiences for Oculus Quest and Rift and shared details on some future technology coming to the Audio SDK. Today, we’re excited to share a behind-the-scenes look at the work being done at Facebook Reality Labs to inform those new advances.
When sound is produced in the real world, it interacts with the environment in a complex way. The vibrations of an object cause sound pressure waves to propagate through the air, where those waves are scattered by surfaces like walls, floors, and ceilings. This changes the sound before it reaches our ears. The sound we hear is really the sum of many echoes of the original source that arrive from different times. If the source is visible, the first sound we hear is the direct sound, which travels along the shortest path from the source to the listener. Next, sound that has reflected off of nearby surfaces will arrive at the listener from various directions. These are called early reflections. The remainder of the sound is reverberation, which consists of a buildup of many delayed echoes that decay smoothly over time.
Until recently, most audio rendering systems for VR and games have only rendered the direct sound component accurately. Yet surprisingly, there’s usually far more reflected and reverberant sound energy in most rooms than there is from the direct sound. To get more realistic audio, a sound designer would have to manually add reverb zones to every location in a virtual environment. This is a time-consuming process that requires a lot of parameter tweaking, manual effort and expertise to get good results.
At Facebook Reality Labs (FRL), we’ve been working over the last year to create high-quality acoustic simulation technologies that can automatically generate reflections and reverberation based on an environment’s geometry. Today, we want to share how researchers on the audio team at FRL have approached this problem.
When Research Team Manager Ravish Mehra started the audio team at FRL four years ago, he envisioned creating a VR world where the virtual audio is perceptually indistinguishable from real-world audio. He knew the first order research problems that he would have to tackle to achieve this future would be high-quality spatial audio and efficient room acoustics. Over the next few years, he started a large research effort to solve the spatial audioproblem while looking for the right person to join his team to solve the room acoustics problem.
“Solving the room acoustics problem is extremely computationally expensive, and I knew that accurately simulating the acoustics of the environment would not be enough,” Mehra says. “Any approach we came up with would need to satisfy the tight computational and memory constraints imposed by real-time VR applications.”
A unique opportunity arose in summer 2017 when Carl Schissler finished his University of North Carolina at Chapel Hill doctorate on that very same topic. Schissler had completed two summer internships on the FRL audio team with Mehra as his intern mentor and was a perfect fit for the open-room acoustics lead researcher role.
“When I started at Facebook Reality Labs last year, I was given the task of creating a system that could simulate all of these complex acoustics in real time,” Schissler explains. “I’ve wanted to create better audio for games since I was very young. Back then, I would modify my favorite games by adding reverb to the sound effects in an effort to make them more atmospheric. Now, years later, I’m thrilled to have the opportunity to work on this technology that has the potential to have a huge impact on sound quality in VR.”
The FRL audio team’s psychoacoustics group led by Research Science Manager Philip Robinson also played a key role in the project. Postdoctoral Research Scientist Sebastià V. Amengual performed experiments to determine which aspects of the acoustic simulation were most important to simulate accurately. With a solid psychoacoustic foundation, the FRL audio team was able to do perceptual evaluation of new audio technologies to inform future development.
The biggest obstacle to realistic simulation of acoustics is the computational complexity involved. There are a number of different existing simulation techniques based on numerical wave solvers or geometric algorithms, but none of them are efficient enough to run in real-time on current hardware. A fast multicore CPU or GPU would be required to make previous approaches run fast enough, and even then they would only be able to simulate a handful of sources at a time. Add in a game engine doing all kinds of graphics, physics, AI, and scripting at the same time, and you can see how difficult it is to get the necessary amount of resources.
The typical way to sidestep this problem is to do a long precomputation to simulate the acoustic responses for every pair of listener and source locations. At runtime, the response for each source can be interpolated from that data and used to filter the source’s audio. In practice, this adds up to a huge amount of data for non-trivial scenes. Another drawback is that, since all of the acoustic responses are precomputed, there cannot be any dynamic scene elements that change the sound. This means that shutting a door won’t stop you from hearing a sound source, and destructible or user-created environments are totally out of the question.
At FRL, our challenge was to develop an approach that was able to render high-quality audio for complex scenes while using as few compute and memory resources as possible. The bar was high—typical games may have hundreds of concurrent sound sources that need to be simulated, and the compute budget is extremely tight. Furthermore, the simulation needed to be dynamic so that it could enable the widest range of immersive audio experiences, unburdened by long precomputation times.
To solve this challenge, Schissler spent almost a year perfecting the simulation engine. “I had to leverage every trick and optimization I could think of to build a system with the required capabilities,” he notes.
To efficiently compute the propagation of sound within a 3D environment, the researchers made use of an advanced ray tracing algorithm. Traditional acoustic ray tracing would require tracing many millions of rays per second, necessitating a large amount of computation. Optimizations developed by Schissler allowed the number of rays to be greatly reduced while maintaining high quality and enabling dynamic scene elements. The largest issue when using stochastic ray tracing is the presence of noise that can lead to audio artifacts. In order to deal with this, the researchers developed clever noise reduction algorithms to filter out the noise in the simulation results.
Another big problem came about when the number of sound sources in the scene grew large. In a naive implementation, the computation time would increase in proportion to the number of sources. One of the key developments that makes the new technology feasible is a perceptually driven, dynamic prioritization and source clustering system. By developing smart heuristics that are able to cluster unimportant or distant sources together, the researchers have been able to dramatically reduce the computation time in very complex scenes.
Using the innovations developed at FRL, the researchers were able to meet the initial goals of the project and deliver a working prototype to the Oculus Audio SDK team led by Spatial Audio Tech Lead Robert Heitkamp. At OC5, Audio Design Manager Tom Smurdon and Software Engineer Pete Stirling presented this system in action. During their talk, Smurdon, a veteran of the game audio industry, gave his opinion on the prototype: “You will know when you’re next to a wall without even seeing anything. You can feel everything—it’s amazing. I’m super enthused and happy about where they are at right now”.
“When you hear a realistic audio simulation for the first time in VR, it’s almost uncanny how much it enhances immersion,” adds Schissler. “Authentic audio rendering can even synergistically help make visuals seem better.”
One of the main goals the team had when developing this tech was to empower sound designers and make it easy for them to create realistic audio experiences in VR. They also wanted to provide artists with parameters to help them achieve their creative visions. “Sometimes you don’t want things to sound 100% realistic,” says Schissler. “During dialogue, you might want to lower the amount of reverberation to make sure the characters are understandable. This new tech has that level of flexibility.”
Now, instead of designers having to setup reverb zones with a complex set of parameters for every room, they only need to assign material properties to the geometry. The dynamic nature of the simulation also benefits content creators—artists can tweak parameters while the simulation is running, greatly reducing iteration times compared to precomputed acoustic simulations.
Now that the FRL audio team has achieved their goal of developing an efficient simulation engine, they’re focusing their efforts on improving the technology to simulate additional acoustic phenomena. There are many acoustic phenomena that are difficult to simulate, such as diffraction, transmission, and standing waves. The next steps are to research new methods to compute those effects efficiently. “I hope that we can continue to push forward the state of the art in audio,” says Schissler. “I’m excited for the time when all games have this level of audio fidelity.”
In his OC5 keynote, Michael Abrash described the problems that must be solved in order to generate authentic audio for VR and AR. In addition to simulation of room acoustics, another challenge in spatial audio is the personalization of head-related transfer functions (HRTF), by which audio can be generated in a way that the 3D spatial cues are tailored to each individual. Abrash explained that HRTF personalization is a problem that may take longer than expected to solve. On the bright side, including acoustic simulation of the environment may help to improve immersion until practical HRTF personalization can be realized.
— The Oculus Team