Acoustic Impulse Responses for Wearable Audio Devices


This post describes our new wearable microphone impulse response data set, which is available for download from the Illinois Data Bank and is the subject of a paper at ICASSP 2019.

Acoustic impulse responses were measured from 24 source angles to 80 points across the body.

Have you ever been at a crowded party and struggled to hear the person next to you? Crowded, noisy places are some of the most difficult listening environments, especially for people with hearing loss. Noisy rooms are also a challenge for electronic listening systems, like teleconferencing equipment and smart speakers that recognize users’ voices. That’s why many conference room microphones and smart speakers use as many as eight microphones instead of just one or two. These arrays of microphones, which are usually laid out in a regular pattern like a circle, let the device focus on sounds coming from one direction and block out other sounds. Arrays work like camera lenses: larger lenses can focus light more narrowly, and arrays with more microphones spread out over a larger area can better distinguish between sounds from different directions.

Wearable microphone arrays

Microphone arrays are also sometimes used in listening devices, including hearing aids and the emerging product category of smart headphones. These array-equipped devices can help users to tune out annoying sounds and focus on what they want to hear. Unfortunately, most hearing aids only have two microphones spaced a few millimeters apart, so they aren’t very good at focusing in one direction. What if hearing aids—or smart headphones, or augmented reality headsets—had a dozen microphones instead of just two? What if they had one hundred microphones spread all over the user’s body, attached to their clothing and accessories? In principle, a large wearable array could provide far better sound quality than listening devices today.

Over the years, there have been several papers about wearable arrays: vests, necklaces, eyeglasses, helmets. It’s also a popular idea on crowdfunding websites. But there have been no commercially successful wearable microphone array products. Although several engineers have built these arrays, no one has rigorously studied their design tradeoffs. How many microphones do we need? How far apart should they be? Does it matter what clothes the user is wearing? How much better are they than conventional listening devices? We developed a new data set to help researchers answer these questions and to explore the possibilities of wearable microphone arrays.

Continue reading

What is augmented listening?


Augmented listening systems “remix” the sounds we perceive around us, making some louder and some quieter.

I am one of millions of people who suffer from hearing loss. For my entire life I’ve known the frustration of asking people to repeat themselves, struggling to communicate over the phone, and skipping social events because I know they’ll be too noisy.  Hearing aids do help, but they don’t work well in the noisy, crowded situations where I need them the most. That’s why I decided to devote my PhD thesis to improving the performance of hearing aids in noisy environments.

As my research progressed, I realized that this problem is not limited to hearing aids, and that the technologies I am developing could also help people who don’t suffer from hearing loss. Over the last few years, there has been rapid growth in a product category that I call augmented listening (AL): technologies that enhance human listening abilities by modifying the sounds they hear in real time. Augmented listening  devices include:

  • traditional hearing aids, which are prescribed by a clinician to patients with hearing loss;
  • low-cost personal sound amplification products (PSAPs), which are ostensibly for normal-hearing listeners;
  • advanced headphones, sometimes called “hearables,” that incorporate listening enhancement as well as features like heart-rate sensing; and
  • augmented- and mixed-reality headsets, which supplement real-world sound with extra information.

These product categories have been converging in recent years as hearing aids add new consumer-technology features like Bluetooth and headphone products promise to enhance real-world sounds. Recent regulatory changes that allow hearing aids to be sold over the counter will also help to shake up the market.

Continue reading

Massive Distributed Microphone Array Dataset

This post describes our new massive distributed microphone array dataset, which is available for download from the Illinois Databank and is featured an upcoming paper at CAMSAP 2019.

Conference room with microphone arrays and loudspeakers

The conference room used for the massive distributed array dataset.

Listening in loud noise is hard: we only have two ears, after all, but a crowded party might have dozens or even hundreds of people talking at once. Our ears are hopelessly outnumbered! Augmented listening devices, however, are not limited by physiology: they could use hundreds of microphones spread all across a room to make sense of the jumble of sounds.

Our world is already filled with microphones. There are multiple microphones in every smartphone, laptop, smart speaker, conferencing system, and hearing aid. As microphone technology and wireless networks improve, it will be possible to place hundreds of microphones throughout crowded spaces to help us hear better. Massive-scale distributed arrays are more useful than compact arrays because they are spread around and among the sound sources. One user’s listening device might have trouble distinguishing between two voices on the other side of the room, but wearable microphones on those talkers can provide excellent information about their speech signals.

Many researchers, including our team, are developing algorithms that can harness  information from massive-scale arrays, but there is little publicly available data suitable for source separation and audio enhancement research at such a large scale. To facilitate this research, we have released a new dataset with 10 speech sources and 160 microphones in a large, reverberant conference room.

Continue reading

Studio-Quality Recording Devices for Smart Home Data Collection

Alexa, Google Home, and Facebook smart devices are becoming more and more commonplace in the home. Although many individuals only use these smart devices to ask for the time or weather, They provide an important edge controller for the Internet of Things infrastructure.

Unknown to some consumers, Alexa and other smart devices contain multiple microphones. Alexa uses these microphones in order to determine the direction of the speaker, and display a light almost as if to “face” the user. This localization function is also very important for processing whatever is about to be said after “Alexa”, or “OK Google”.

In our research lab, this kind of localization is important and we hope to extrapolate more from individuals’ interactions with their home smart speaker. The final details of the experiments we hope to run and not yet concrete. However, we know that we will have to have our own Alexa-like device that can do studio recording with a number of different channels.

This project was led by Uriah Jones and myself, with input from all members of the team. The original goal was to design a microphone array and speaker housing, using the Lavalier microphones that are currently being used in experiments. We looked at the different designs of the commercial products and tried to identify the changes made between each generation of product. Although we found a shift from sharp to rounded edges and the addition of cloth material around the microphones, we decided to use the full size Amazon Alexa tower as our design inspiration.

Similar to the Alexa tower, we planned on using a ~2 inch driver for superior sound quality over a smaller speaker. This forced us to reconsider our original design where the microphones were surrounding the speaker. In our final design, which can be seen below, the microphones sit much lower on the device, under the speaker housing. This is a deviation of the Alexa tower speaker which has all 7 of its microphones located at the top of the device.

Because our experiments are not completely finalized yet, we wanted to make our speaker reconfigurable as to record data with a number of different channels. We decided on a maximum of 16 channels, because that is the number of Lavalier microphones that we have.

Our first challenge in this project was to create the housing for the small microphones. In order to make ensure easy setup and teardown of the system, we had to create a way to place and remove the mics without damaging them. After taking measurements, we 3D printed 1/16th of the microphone housing in order to test the ability of the easy release mic holder.

Our first iteration was too tight, and highlighted an important issue: the cable was not stiff enough to be able to push the mic out without damaging it. Our initial hope was that the curved shape of the housing would allow us to simply apply light pressure to the cable, and the mic would pop back out.

Our second 3D printed piece had the perfect dimensions for holding the microphone, although we still were not able to push the mic out without a lot of pressure.

Ensuring a snug fit for the microphone is more important, so we decided an easy fix would to attach thread to each mic that would allow us to pull it out without any damage. A future iteration of the design might include spring loaded clamps to hold the mics, however we are more focused on completing a Proof-of-Concept before we begin work on more intricate designs.

Currently, we are planning to do a full print in order to see how the pieces fit together, and whether or not the tolerances were correct. Please cehck back in the future for more updates!

Sound Source Localization

Imagine you are at a noisy restaurant, you hear the clanging of the dishes, the hearty laughs from the patrons around you, the musical ambience, and you are struggling to hear your friend from across the table. Wouldn’t it be nice if the primary noise that you hear was solely from your friend? That is the problem that sound source localization can help solve.

Sound source localization, as you might have guessed, is the process of identifying unique noises that you want to amplify. It is how your Amazon Echo Dot identifies who is speaking to it with the little ring at the top. For Engineering Open House, we wanted to create a device that can mimic the colorful ring at the top in a fun, creative way. Instead of a colorful light ring, we wanted to use a mannequin head that turns towards the audience when they speak to it.

My colleague Manan and I designed “Alexander”, the spinning head that can detect speech.  We knew our system had to contain a microphone array, a processor to control the localization system and a motor to turn the mannequin head. Our choices of each component are as follows:


The Microphone Array: The Matrix Creator

The Matrix Creator is a powerful development board that contains an FPGA system, sensors, an array with 8 microphones and other interesting features. Although this hobbyist board has numerous functions that can drive intensive prototypes, we simply used it for its easy-to-interface microphone array. With an extensive library and helpful documentation, the Matrix company played a big part in assisting us with this project.

Interested in this product? Find it here:

The Processor: Raspberry Pi 3

The Raspberry Pi is a cheap, yet extremely useful computer system that can run the Raspbian OS and is an excellent prototyping system. Its widespread use and open-source community makes this component relatively easy to use and is the “brain” of our project. And it made an easy choice because of how easy it connects to the Matrix Creator through the GPIO pins.

To learn more about the Raspberry Pi visit

The Motor: Servo

We chose to spin the head using a servo. The Matrix Creator includes documentation and code for using a servo with its localization library. In addition, servo’s are cheap and readily available.

One way to perform sound source localization is to use Direction of Arrival (DOA) estimations. DOA estimations are calculations that determine the direction of where a signal is coming from. Many times it is determined from beamforming techniques such as constructive/destructive signal interference and signal delays. We wanted to use these estimations in our design and the ODAS (Open embeddeD Audition System) library provided by the Matrix company included basic DOA estimation approximation programs. These approximation programs generated a real time angle calculation to determine in what direction the sound was coming from. The software was loaded onto the Raspberry Pi and determined the direction from the data supplied by the microphone array on the Matrix Creator board.

Finally, we added the Servo using another interface program from ODAS. This program allowed us to communicate with the Matrix Creator and the Servo and the Servo spun in the direction determined by the DOA program. With the Servo spinning in the direction of speech, all that was left to do was add our lovely Styrofoam head – Alexander – and construct a simple shelf for the head to spin on. A video of the demonstration is shown below from Engineering Open House:




Augmented Listening at Engineering Open House 2019

Have you ever wondered what it would sound like to listen through sixteen ears? This past March, hundreds of Central Illinois children and families experienced microphone-array augmented listening technology firsthand at the annual Engineering Open House (EOH) sponsored by the University of Illinois College of Engineering. At the event, which attracts thousands of elementary-, middle-, and high-school students and local community members, visitors learned about technologies for enhancing human and machine listening.

Listen up (or down): The technology of directional listening

Our team’s award-winning exhibit introduced visitors to several directional listening technologies, which enhance audio by isolating sounds that come from a certain direction. Directional listening is important when the sounds we want to hear are far away, or when there are many different sounds coming from different directions—like at a crowded open house! There are two ways to focus on sounds from one direction: we can physically block sounds from directions we don’t want, or we can use the mathematical tools of signal processing to cancel out those unwanted sounds. At our exhibit in Engineering Hall, visitors could try both.

Ryan holds up an ear horn at EOH 2019

This carefully designed mechanical listening device is definitely not an oil funnel from the local hardware store.

The oldest and most intuitive listening technology is the ear horn, pictured above. This horn literally funnels sound waves from the direction in which it is pointed. The effect is surprisingly strong, and there is a noticeable difference in the acoustics of the two horns we had on display. The shape of the horn affects both its directional pattern and its effect on different sound wavelengths, which humans perceive as pitch. The toy listening dish shown below operates on the same principle, but also includes an electronic amplifier. The funnels work much better for directional listening, but the spy gadget is the clear winner for style.

This toy listening dish is not very powerful, but it certainly looks cool!

These mechanical hearing aids rely on physical acoustics to isolate sound from one direction. To listen in a different direction, the user needs to physically turn them in that direction. Modern directional listening technology uses microphone arrays, which are groups of microphones spread apart from each other in space. We can use signal processing to compare and combine the signals recorded by the microphones to tell what direction a sound came from or to listen in a certain direction. We can change the direction using software, without physically moving the microphones. With sophisticated array signal processing, we can even listen in multiple directions at once, and can compensate for reflections and echoes in the room.

Continue reading

Capturing Data From a Wearable Microphone Array


Constructing a microphone array is a challenge of its own, but how do we actually process the microphone array data to do things like filtering and beamforming? One solution is to store the data on off-chip memory for later processing. This solution is great for experimenting with different microphone arrays since we can process the data offline and see what filter combinations work best from the data that we collected. This solution also avoids having to make changes to the hardware design any time we want to change filter coefficients or what algorithm is being implemented.

Overview of a basic microphone array system

Here’s a quick refresher of the DE1-SoC, the development board we use to process the microphone array.

The main components in this project that we utilize are the GPIO pins, off-chip DDR3 memory, the HPS, and the Ethernet port. The microphone array connects to the GPIO port of the FPGA. The digital I2S data is interpreted on the FPGA by deserializing the data into samples. The 1-GB off-chip memory is where the samples will be stored for later processing. The HPS that is running linux will be able to grab the data from memory and store it on the SD card. Connecting the Ethernet port on a computer gives us the ability to grab the data from the FPGA seamlessly using shell and python scripts.

Currently the system is setup to stream the samples from the microphone array to the output of the audio codec. The microphones on the left side are summed up and output to the left channel, and the microphones on the right side are summed up and output to the right channel. The microphones are not processed before being sent to the CODEC. Here is a block diagram of what the system looks like before we add a DMA interface to the system.

Continue reading

Talking Heads

Within the Augmented Listening team, it has been my goal to develop Speech Simulators for testing purposes. These would be distributed around the environment in a sort of ‘Cocktail Party’ scenario.


Why use a Speech Simulator instead of human subjects?

Human Subjects can never say the same thing exactly the same way twice. By using anechoic recordings of people speaking played through speakers, we can remove the human error from the experiment. We can also simulate the user’s own voice captured by a wearable microphone array.


Why not just use normal Studio Monitors?

While studio monitors are designed to have a flat frequency response perfect for this situation, their off-axis performance is not consistent with that of the human voice. As most monitors use multiple drivers to achieve the desired frequency range, the dispersion is also inconsistent across the frequency range as it crosses between the drivers.

Continue reading

How loud is my audio device? : Thinking about safe listening through the new WHO-ITU Standard

With March 3rd being World Hearing Day, WHO-ITU (World Health Organization and International Telecommunication Union) released a new standard for safe listening devices on February 12th, 2019. As our group researches on improving hearing through array processing, we also think that preventing hearing loss and taking care of our hearing is important. Hearing loss is almost permanent, and there are currently no treatment for restoring hearing once it is lost. In this post, I will revisit the new WHO-ITU standard for safe listening devices, and I will also test how loud my personal audio device is with respect to the new standard.

Summary of WHO-ITU standard for safe listening devices

In the new WHO-ITU standard for safe listening devices, WHO-ITU recommends including the following four functions in audio devices (which is originally found here):

  • “Sound allowance” function: software that tracks the level and duration of the user’s exposure to sound as a percentage used of a reference exposure.
  • Personalized profile: an individualized listening profile, based on the user’s listening practices, which informs the user of how safely (or not) he or she has been listening and gives cues for action based on this information.
  • Volume limiting options: options to limit the volume, including automatic volume reduction and parental volume control.
  • General information: information and guidance to users on safe listening practices, both through personal audio devices and for other leisure activities.

Also, as it is written in the Introduction of Safe Listening Devices and Systems, WHO-ITU considers safe level of listening to be listening to sound with loudness under 80dB for a maximum of 40 hours per week. This recommendation is stricter than the standard currently implemented by OSHA (Occupational Safety and Health Administration), which enforces a PEL (permissible exposure limit) of 90dBA* for 8 hours per day with the exposure time halving with each 5dBA* increase in the noise level. NIOSH (The National Institute for Occupational Safety and Health) also has a different set of recommendations concerning noise exposure. They recommend an exposure time of 8 hours for a noise of 85dBA* with the exposure time halving with each 3dBA* increase in the noise level. With this recommendation, workers are recommended to be exposed to noise with 100dBA* for only 15 minutes per day!

Continue reading

Using Notch for Low-Cost Motion Capture

This semester, I was fortunate to be able to toy around with a six-pack of Notch sensors and do some basic motion capture. Later in the semester, I was asked to do a basic comparison of existing motion capture technology that could be used for the tracking of microphone arrays.

Motion capture is necessary for certain projects in our lab because allows us to track the positions of multiple microphones in 3D space. When recording audio, the locations of the microphones are usually fixed, with known values for the difference in position. This known value allows us to determine the relative location of an audio source using triangulation.

For a moving microphone array, the position of each microphone (and the space between them) must be known in order to do correct localization calculations. Currently, our project lead Ryan Corey is using an ultrasonic localization system which requires heavy computing power and is not always accurate.

This segment of my projects is dedicated to determining the effectiveness of Notch for future use in the lab.

Continue reading

Constructing Microphone Arrays

Microphone arrays are powerful listening and recording devices composed of many individual microphones operating together in tandem. Many popular microphone arrays (such as the one found in the Amazon Echo) are arranged circularly, but they can be in any configuration the designer chooses. In our Augmented Listening Lab, we strive to make these arrays wearable to assist the hard of hearing or to serve recording needs. Over the past year, I have been constructing functional prototypes of microphone arrays using MEMS microphones and FPGAs.

Above is a MEMS microphone breakout board created by Adafruit. You can find it here:

When placing these microphones into an array, they all share the Bit Clock, Left/Right Clock, 3V and Ground signals. All of the microphones share the same clock! Pairs of microphones share one Data Out line that goes to our array processing unit (in our lab we use an FPGA) and the Select pin distinguishes left and right channels for each pair.

The first microphone array I constructed was using a construction helmet! The best microphone arrays leverage spatial area – the larger area the microphones surround or cover, the clearer the audio is. Sometimes in our lab, we test audio using microphone arrays placed on sombreros – a wide and spacious area. Another characteristic of good microphone array design is spacing the microphones evenly around the area. The construction helmet array I built had 12 microphones spaced around the outside on standoffs and I kept the wires on the inside.

Finally, we use a Field Programmable Gate Array (FPGA) to do real time processing on these microphone arrays. SystemVerilog makes it easy to build modules that control microphone pairs and channels. FPGAs are best used in situations where performance needs to be maximized, in this case we need to reduce latency as much as possible. In SystemVerilog we can build software for our specific application and declare the necessary constraints to make our array as responsive and efficient as possible.

My next goal was to create a microphone array prototype thats wearable and has greater aesthetic appeal than the construction helmet. My colleague, Uriah, designed a pair of black, over-the-ear headphones that contain up to 11 MEMS microphones. The first iteration of this design was breadboarded but future iterations will be cleaned up with a neat PCB design.

A pic of me wearing the breadboarded, over-the-ear headphone array.