Audio Source Remixing

This post describes our paper “Binaural Audio Source Remixing for Microphone Array Listening Devices” presented at ICASSP 2020. You can read the paper here or watch the video presentation here.

Microphone arrays are important tools for spatial sound processing. Traditionally, most methods for spatial sound capture can be classified as either beamforming, which tries to isolate a sound coming from a single direction, or source separation, which tries to split a recording of several sounds into its component parts. Beamforming and source separation are useful in crowded, noisy environments with many sound sources, and are widely used in speech recognition systems and teleconferencing systems.

Since microphone arrays are so useful in noisy environments, we would expect them to work well in hearing aids and other augmented listening applications. Researchers have been building microphone-array hearing aids for more than 30 years, and laboratory experiments have consistently shown that they can reduce noise and improve intelligibility, but there has never been a commercially successful listening device with a powerful microphone array. Why not?

The problem may be that most researchers have approached listening devices as if they were speech recognition or teleconferencing systems, designing beamformers that try to isolate a single sound and remove all the others. They promise to let the listener hear the person across from them in a crowded restaurant and silence everyone else. But unlike computers, humans are used to hearing multiple sounds at once, and our brains can do a good job separating sound sources on their own. Imagine seeing everyone’s lips move but not hearing any sound! If a listening device tries to focus on only one sound, it can seem unnatural to the listener and introduce distortion that makes it harder, not easier, to hear.

A source remixing system changes the relative levels of sounds in a mixture while preserving their spatial cues.

This paper proposes a new type of array processing for listening devices: source remixing. Instead of trying to isolate or separate sound sources, the system tries to change their relative levels in a way that sounds natural to the listener. In a good remixing system, it will seem as if real-world sounds are louder or quieter than before.

Continue reading

Deformable Microphone Arrays

This post describes our paper “Motion-Robust Beamforming for Deformable Microphone Arrays,” which won the best student paper award at WASPAA 2019.

When our team designs wearable microphone arrays, we usually test them on our beloved mannequin test subject, Mike A. Ray. With Mike’s help, we’ve shown that large wearable microphone arrays can perform much better than conventional earpieces and headsets for augmented listening applications, such as noise reduction in hearing aids. Mannequin experiments are useful because, unlike a human, Mike doesn’t need to be paid, doesn’t need to sign any paperwork, and doesn’t mind having things duct-taped to his head. There is one major difference between mannequin and human subjects, however: humans move. In our recent paper at WASPAA 2019, which won a best student paper award, we described the effects of this motion on microphone arrays and proposed several ways to address it.

Beamformers, which use spatial information to separate and enhance sounds from different directions, rely on precise distances between microphones. (We don’t actually measure those distances directly; we measure relative time delays between signals at the different microphones, which depend on distances.) When a human user turns their head – as humans do constantly and subconsciously while listening – the microphones near the ears move relative to the microphones on the lower body. The distances between microphones therefore change frequently.

In a deformable microphone array, microphones can move relative to each other.

Microphone array researchers have studied motion before, but it is usually the sound source that moves relative to the entire array. For example, a talker might walk around the room. That problem, while challenging, is easier to deal with: we just need to track the direction of the user. Deformation of the array itself – that is, relative motion between microphones – is more difficult because there are more moving parts and the changing shape of the array has complicated effects on the signals. In this paper, we mathematically analyzed the effects of deformation on beamformer performance and considered several ways to compensate for it.

Continue reading

Cooperative Listening Devices

This post describes our paper presented at CAMSAP 2019.

Imagine what it would sound like to listen through someone else’s ears. I don’t mean that in a metaphorical sense. What if you had a headset that would let you listen through microphones in the ears of someone else in the room, so that you can hear what they hear? Better yet, what if your headset was connected to the ears of everyone else in the room? In our group’s latest paper, “Cooperative Audio Source Separation and Enhancement Using Distributed Microphone Arrays and Wearable Devices,” presented this week at CAMSAP 2019, we designed a system to do just that.

Our team is trying to improve the performance of hearing aids and other augmented listening devices in crowded, noisy environments. In spaces like restaurants and bars where there are many people talking at once, it can be difficult for even normal-hearing people to hold a conversation. Microphone arrays can help by spatially separating sounds, so that each user can hear what they want to hear and turn off the sounds they don’t want to hear. To do that in a very noisy room, however, we need a large number of microphones that cover a large area.

Illustration of human listeners with headsets, microphone-array appliances, and sound sources spread around a room

Complex listening environments include many different sound sources, but also many microphone-equipped devices. Each listening device tries to enhance a different sound source.

In the past, we’ve built large wearable microphone arrays with sensors that cover wearable accessories or even the entire body. These arrays can perform much better than conventional earpieces, but they aren’t enough in the most challenging environments. In a large, reverberant room packed with noisy people, we need microphones spread all over the room. Instead of having a compact microphone array surrounded by sound sources, we should have microphones spread around and among the sound sources, helping each listener to distinguish even faraway sounds.

Continue reading

Massive Distributed Microphone Array Dataset

This post describes our new massive distributed microphone array dataset, which is available for download from the Illinois Databank and is featured an upcoming paper at CAMSAP 2019.

Conference room with microphone arrays and loudspeakers

The conference room used for the massive distributed array dataset.

Listening in loud noise is hard: we only have two ears, after all, but a crowded party might have dozens or even hundreds of people talking at once. Our ears are hopelessly outnumbered! Augmented listening devices, however, are not limited by physiology: they could use hundreds of microphones spread all across a room to make sense of the jumble of sounds.

Our world is already filled with microphones. There are multiple microphones in every smartphone, laptop, smart speaker, conferencing system, and hearing aid. As microphone technology and wireless networks improve, it will be possible to place hundreds of microphones throughout crowded spaces to help us hear better. Massive-scale distributed arrays are more useful than compact arrays because they are spread around and among the sound sources. One user’s listening device might have trouble distinguishing between two voices on the other side of the room, but wearable microphones on those talkers can provide excellent information about their speech signals.

Many researchers, including our team, are developing algorithms that can harness  information from massive-scale arrays, but there is little publicly available data suitable for source separation and audio enhancement research at such a large scale. To facilitate this research, we have released a new dataset with 10 speech sources and 160 microphones in a large, reverberant conference room.

Continue reading

Augmented Listening at Engineering Open House 2019

Have you ever wondered what it would sound like to listen through sixteen ears? This past March, hundreds of Central Illinois children and families experienced microphone-array augmented listening technology firsthand at the annual Engineering Open House (EOH) sponsored by the University of Illinois College of Engineering. At the event, which attracts thousands of elementary-, middle-, and high-school students and local community members, visitors learned about technologies for enhancing human and machine listening.

Listen up (or down): The technology of directional listening

Our team’s award-winning exhibit introduced visitors to several directional listening technologies, which enhance audio by isolating sounds that come from a certain direction. Directional listening is important when the sounds we want to hear are far away, or when there are many different sounds coming from different directions—like at a crowded open house! There are two ways to focus on sounds from one direction: we can physically block sounds from directions we don’t want, or we can use the mathematical tools of signal processing to cancel out those unwanted sounds. At our exhibit in Engineering Hall, visitors could try both.

Ryan holds up an ear horn at EOH 2019

This carefully designed mechanical listening device is definitely not an oil funnel from the local hardware store.

The oldest and most intuitive listening technology is the ear horn, pictured above. This horn literally funnels sound waves from the direction in which it is pointed. The effect is surprisingly strong, and there is a noticeable difference in the acoustics of the two horns we had on display. The shape of the horn affects both its directional pattern and its effect on different sound wavelengths, which humans perceive as pitch. The toy listening dish shown below operates on the same principle, but also includes an electronic amplifier. The funnels work much better for directional listening, but the spy gadget is the clear winner for style.

This toy listening dish is not very powerful, but it certainly looks cool!

These mechanical hearing aids rely on physical acoustics to isolate sound from one direction. To listen in a different direction, the user needs to physically turn them in that direction. Modern directional listening technology uses microphone arrays, which are groups of microphones spread apart from each other in space. We can use signal processing to compare and combine the signals recorded by the microphones to tell what direction a sound came from or to listen in a certain direction. We can change the direction using software, without physically moving the microphones. With sophisticated array signal processing, we can even listen in multiple directions at once, and can compensate for reflections and echoes in the room.

Continue reading

Acoustic Impulse Responses for Wearable Audio Devices

This post describes our new wearable microphone impulse response data set, which is available for download from the Illinois Data Bank and is the subject of a paper at ICASSP 2019.

Acoustic impulse responses were measured from 24 source angles to 80 points across the body.

Have you ever been at a crowded party and struggled to hear the person next to you? Crowded, noisy places are some of the most difficult listening environments, especially for people with hearing loss. Noisy rooms are also a challenge for electronic listening systems, like teleconferencing equipment and smart speakers that recognize users’ voices. That’s why many conference room microphones and smart speakers use as many as eight microphones instead of just one or two. These arrays of microphones, which are usually laid out in a regular pattern like a circle, let the device focus on sounds coming from one direction and block out other sounds. Arrays work like camera lenses: larger lenses can focus light more narrowly, and arrays with more microphones spread out over a larger area can better distinguish between sounds from different directions.

Wearable microphone arrays

Microphone arrays are also sometimes used in listening devices, including hearing aids and the emerging product category of smart headphones. These array-equipped devices can help users to tune out annoying sounds and focus on what they want to hear. Unfortunately, most hearing aids only have two microphones spaced a few millimeters apart, so they aren’t very good at focusing in one direction. What if hearing aids—or smart headphones, or augmented reality headsets—had a dozen microphones instead of just two? What if they had one hundred microphones spread all over the user’s body, attached to their clothing and accessories? In principle, a large wearable array could provide far better sound quality than listening devices today.

Over the years, there have been several papers about wearable arrays: vests, necklaces, eyeglasses, helmets. It’s also a popular idea on crowdfunding websites. But there have been no commercially successful wearable microphone array products. Although several engineers have built these arrays, no one has rigorously studied their design tradeoffs. How many microphones do we need? How far apart should they be? Does it matter what clothes the user is wearing? How much better are they than conventional listening devices? We developed a new data set to help researchers answer these questions and to explore the possibilities of wearable microphone arrays.

Continue reading

What is augmented listening?

Augmented listening systems “remix” the sounds we perceive around us, making some louder and some quieter.

I am one of millions of people who suffer from hearing loss. For my entire life I’ve known the frustration of asking people to repeat themselves, struggling to communicate over the phone, and skipping social events because I know they’ll be too noisy.  Hearing aids do help, but they don’t work well in the noisy, crowded situations where I need them the most. That’s why I decided to devote my PhD thesis to improving the performance of hearing aids in noisy environments.

As my research progressed, I realized that this problem is not limited to hearing aids, and that the technologies I am developing could also help people who don’t suffer from hearing loss. Over the last few years, there has been rapid growth in a product category that I call augmented listening (AL): technologies that enhance human listening abilities by modifying the sounds they hear in real time. Augmented listening  devices include:

  • traditional hearing aids, which are prescribed by a clinician to patients with hearing loss;
  • low-cost personal sound amplification products (PSAPs), which are ostensibly for normal-hearing listeners;
  • advanced headphones, sometimes called “hearables,” that incorporate listening enhancement as well as features like heart-rate sensing; and
  • augmented- and mixed-reality headsets, which supplement real-world sound with extra information.

These product categories have been converging in recent years as hearing aids add new consumer-technology features like Bluetooth and headphone products promise to enhance real-world sounds. Recent regulatory changes that allow hearing aids to be sold over the counter will also help to shake up the market.

Continue reading