And yet even now, after 15src years of development, the sound we hear from even a high-end audio system falls far short of what we hear when we are physically present at a live music performance. At such an event, we are in a natural sound field and can readily perceive that the sounds of different instruments come from different locations, even when the sound field is criss-crossed with mixed sound from multiple instruments. There’s a reason why people pay considerable sums to hear live music: It is more enjoyable, exciting, and can generate a bigger emotional impact.
Today, researchers, companies, and entrepreneurs, including ourselves, are closing in at last on recorded audio that truly re-creates a natural sound field. The group includes big companies, such as Apple and Sony, as well as smaller firms, such as
Creative. Netflix recently disclosed a partnership with Sennheiser under which the network has begun using a new system, Ambeo 2-Channel Spatial Audio, to heighten the sonic realism of such TV shows as “Stranger Things” and “The Witcher.”
There are now at least half a dozen different approaches to producing highly realistic audio. We use the term “soundstage” to distinguish our work from other audio formats, such as the ones referred to as spatial audio or immersive audio. These can represent sound with more spatial effect than ordinary stereo, but they do not typically include the detailed sound-source location cues that are needed to reproduce a truly convincing sound field.
We believe that soundstage is the future of music recording and reproduction. But before such a sweeping revolution can occur, it will be necessary to overcome an enormous obstacle: that of conveniently and inexpensively converting the countless hours of existing recordings, regardless of whether they’re mono, stereo, or multichannel surround sound (5.1, 7.1, and so on). No one knows exactly how many songs have been recorded, but according to the entertainment-metadata concern Gracenote, more than 2srcsrc million recorded songs are available now on planet Earth. Given that the average duration of a song is about 3 minutes, this is the equivalent of about 1,1srcsrc years of music.
After separating a recording into its component tracks, the next step is to remix them into a soundstage recording. This is accomplished by a soundstage signal processor. This soundstage processor performs a complex computational function to generate the output signals that drive the speakers and produce the soundstage audio. The inputs to the generator include the isolated tracks, the physical locations of the speakers, and the desired locations of the listener and sound sources in the re-created sound field. The outputs of the soundstage processor are multitrack signals, one for each channel, to drive the multiple speakers.
The sound field can be in a physical space, if it is generated by speakers, or in a virtual space, if it is generated by headphones or earphones. The function performed within the soundstage processor is based on computational acoustics and psychoacoustics, and it takes into account sound-wave propagation and interference in the desired sound field and the HRTFs for the listener and the desired sound field.
For example, if the listener is going to use earphones, the generator selects a set of HRTFs based on the configuration of desired sound-source locations, then uses the selected HRTFs to filter the isolated sound-source tracks. Finally, the soundstage processor combines all the HRTF outputs to generate the left and right tracks for earphones. If the music is going to be played back on speakers, at least two are needed, but the more speakers, the better the sound field. The number of sound sources in the re-created sound field can be more or less than the number of speakers.
We released our first soundstage app, for the iphone, in 2src2src. It lets listeners configure, listen to, and save soundstage music in real time—the processing causes no discernible time delay. The app, called
3D Musica, converts stereo music from a listener’s personal music library, the cloud, or even streaming music to soundstage in real time. (For karaoke, the app can remove vocals, or output any isolated instrument.)
Earlier this year, we opened a Web portal,
3dsoundstage.com, that provides all the features of the 3D Musica app in the cloud plus an application programming interface (API) making the features available to streaming music providers and even to users of any popular Web browser. Anyone can now listen to music in soundstage audio on essentially any device.
When sound travels to your ears, unique characteristics of your head—its physical shape, the shape of your outer and inner ears, even the shape of your nasal cavities—change the audio spectrum of the original sound.
We also developed separate versions of the 3D Soundstage software for vehicles and home audio systems and devices to re-create a 3D sound field using two, four, or more speakers. Beyond music playback, we have high hopes for this technology in videoconferencing. Many of us have had the fatiguing experience of attending videoconferences in which we had trouble hearing other participants clearly or being confused about who was speaking. With soundstage, the audio can be configured so that each person is heard coming from a distinct location in a virtual room. Or the “location” can simply be assigned depending on the person’s position in the grid typical of Zoom and other videoconferencing applications. For some, at least, videoconferencing will be less fatiguing and speech will be more intelligible.
Just as audio moved from mono to stereo, and from stereo to surround and spatial audio, it is now starting to move to soundstage. In those earlier eras, audiophiles evaluated a sound system by its fidelity, based on such parameters as bandwidth,
harmonic distortion, data resolution, response time, lossless or lossy data compression, and other signal-related factors. Now, soundstage can be added as another dimension to sound fidelity—and, we dare say, the most fundamental one. To human ears, the impact of soundstage, with its spatial cues and gripping immediacy, is much more significant than incremental improvements in fidelity. This extraordinary feature offers capabilities previously beyond the experience of even the most deep-pocketed audiophiles.
Technology has fueled previous revolutions in the audio industry, and it is now launching another one. Artificial intelligence, virtual reality, and digital signal processing are tapping in to psychoacoustics to give audio enthusiasts capabilities they’ve never had. At the same time, these technologies are giving recording companies and artists new tools that will breathe new life into old recordings and open up new avenues for creativity. At last, the century-old goal of convincingly re-creating the sounds of the concert hall has been achieved.
This article appears in the October 2src22 print issue as “How Audio Is Getting Its Groove Back.”