Motion and Audio, with Robots


Creating datasets is expensive, be it in terms of time or funding. This is especially true for spatial audio: Some applications require that hundreds of recordings are taken from specific regions in a room, while others involve arranging many microphones and loudspeakers to mimic real-life scenarios – for instance, a conference. Few researchers have access to dedicated recording spaces that can accurately portray acoustically-interesting environments, and fewer still are able to create dynamic scenes where microphones and speakers move precisely to replicate how people walk and talk.

To support the creation of these types of datasets, we propose the Mechatronic Acoustic Research System, or MARS for short. We envision MARS as a robot-enabled recording space that researchers would have remote access to. Users could emulate a wide variety of acoustic environments and take recordings with little effort. Our initial concept is for a website design interface that can be used to specify a complicated experiment, which a robot system then automatically recreates.

Diagram of MARS

How the MARS frontend and backend link together

The first step in developing such a tool is to show that robots and remote access are viable tools for research-grade audio data collection. We developed a simple scripting API in Python to coordinate the following devices:

  • SpiderBot – a cable-driven parallel robot that accurately moves a payload to various positions and heights
  • Linear Guide Rail – an actuated platform that slides along a rail
  • Acoustic Head Simulator – a 3D-printed model head with two microphones in the ear canals and a speaker at the mouth, capable of rotation
  • Audio Interface – a 64-in, 64-out audio interface that can be scripted for audio playback and recording
SpiderBot cable-driven parallel robot carrying a styrofoam model head as a payload, in the Augmented Listening Lab

The SpiderBot carrying an example payload

We designed a custom control framework using the Robot Operating System, ROS, to offer synchronized motion and audio playback or recording. We then ran two types of experiment that are relevant to ongoing research in the acoustic signal processing field:

  • Dense spatial sampling – the Acoustic Head Simulator was positioned by the Linear Guide Rail and rotated to various positions for recordings to be taken. This experiment ran autonomously over a span of multiple days
  • Dynamic scenes – a loudspeaker was moved along a path by the SpiderBot for simultaneous motion and playback. Recordings were taken from the Acoustic Head Simulator

These experiments were each repeated multiple times, and the recordings were used to evaluate repeatability of the system, which was found to be near-ideal.

This initial proof-of-concept for MARS showed that an openly-accessible data collection workbench is within reach. Currently, we are developing many new robots to integrate with MARS, which will allow for a greater range of experiments. Our website frontend is under development, and will be the key component that connects users of our system to a full suite of mechanized recording spaces. The digital twin physics simulation, which is used in the frontend to visualize experiments and by the backend to validate experiments, is being extended to support the growing roster of devices. With the successful implementation of a MARS prototype, we have moved one step closer to offering custom audio data collection for all.

See our IWAENC 2022 paper where we discuss the proof-of-concept implementation in detail, introducing MARS to the research community for the first time.

Hearing aid algorithm adapted for COVID-19 ventilators

Audio signal processing would seem to have nothing to do with the COVID-19 pandemic. It turns out, however, that a low-complexity signal processing algorithm used in hearing aids can also be used to monitor breathing for patients on certain types of ventilator.

To address the shortage of emergency ventilators caused by the pandemic, this spring the Grainger College of Engineering launched the Illinois RapidVent project to design an emergency ventilator that could be rapidly and inexpensively produced. In little more than a week, the team built a functional pressure-cycled pneumatic ventilator, which is now being manufactured by Belkin.

The Illinois RapidVent is powered by pressurized gas and has no electronic components, making it easy to produce and to use. However, it lacks many of the monitoring features found in advanced commercial ventilators. Without an alarm to indicate malfunctions, clinicians must constantly watch patients to make sure that they are still breathing. More-advanced ventilators also display information about pressure, respiratory rate, and air volume that can inform care decisions.

The Illinois RapidAlarm adds monitoring features to pressure-cycled ventilators.

To complement the ventilator, a team of electrical engineers worked with medical experts to design a sensor and alarm system known as the Illinois RapidAlarm. The device attaches to a pressure-cycled ventilator, such as the Illinois RapidVent, and monitors the breathing cycle. The device includes a pressure sensor, a microcontroller, a buzzer, three buttons, and a display. It shows clinically useful metrics and sounds an audible alarm when the ventilator stops working. The hardware design, firmware code, and documentation are available online with open-source licenses. A paper describing how the system works is available on arXiv.

Continue reading


The EchoXL is a large format Alexa-powered smart speaker as part of TEC’s Alexa program. Based on the current market offerings, it would be the largest of its kind. It’s form and features will be modeled after Amazon’s Echo speaker, to keep branding consistent and to also exemplify a potential line expansion. While small Bluetooth speakers still hold the largest market segment in audio, the market for larger sound systems has been steadily increasing over the past few years (as evidenced by new products from LG, Samsung, Sony, and JBL). Currently, Amazon does not have any products in this category. 

The speaker will be used as a public demonstration piece to exhibit the current technology incorporated within smart speakers, such as the implementation of microphone arrays as wells as internal room correction capabilities. The novelty factor of a scaled up Echo speaker will also be useful for press for the group’s research.

Continue reading

Studio-Quality Recording Devices for Smart Home Data Collection

Alexa, Google Home, and Facebook smart devices are becoming more and more commonplace in the home. Although many individuals only use these smart devices to ask for the time or weather, They provide an important edge controller for the Internet of Things infrastructure.

Unknown to some consumers, Alexa and other smart devices contain multiple microphones. Alexa uses these microphones in order to determine the direction of the speaker, and display a light almost as if to “face” the user. This localization function is also very important for processing whatever is about to be said after “Alexa”, or “OK Google”.

In our research lab, this kind of localization is important and we hope to extrapolate more from individuals’ interactions with their home smart speaker. The final details of the experiments we hope to run and not yet concrete. However, we know that we will have to have our own Alexa-like device that can do studio recording with a number of different channels.

Continue reading

Sound Source Localization

Imagine you are at a noisy restaurant, you hear the clanging of the dishes, the hearty laughs from the patrons around you, the musical ambience, and you are struggling to hear your friend from across the table. Wouldn’t it be nice if the primary noise that you hear was solely from your friend? That is the problem that sound source localization can help solve.

Sound source localization, as you might have guessed, is the process of identifying unique noises that you want to amplify. It is how your Amazon Echo Dot identifies who is speaking to it with the little ring at the top. For Engineering Open House, we wanted to create a device that can mimic the colorful ring at the top in a fun, creative way. Instead of a colorful light ring, we wanted to use a mannequin head that turns towards the audience when they speak to it.

My colleague Manan and I designed “Alexander”, the spinning head that can detect speech.  We knew our system had to contain a microphone array, a processor to control the localization system and a motor to turn the mannequin head. Our choices of each component are as follows:

Continue reading

Capturing Data From a Wearable Microphone Array


Constructing a microphone array is a challenge of its own, but how do we actually process the microphone array data to do things like filtering and beamforming? One solution is to store the data on off-chip memory for later processing. This solution is great for experimenting with different microphone arrays since we can process the data offline and see what filter combinations work best from the data that we collected. This solution also avoids having to make changes to the hardware design any time we want to change filter coefficients or what algorithm is being implemented.

Overview of a basic microphone array system

Here’s a quick refresher of the DE1-SoC, the development board we use to process the microphone array.

The main components in this project that we utilize are the GPIO pins, off-chip DDR3 memory, the HPS, and the Ethernet port. The microphone array connects to the GPIO port of the FPGA. The digital I2S data is interpreted on the FPGA by deserializing the data into samples. The 1-GB off-chip memory is where the samples will be stored for later processing. The HPS that is running linux will be able to grab the data from memory and store it on the SD card. Connecting the Ethernet port on a computer gives us the ability to grab the data from the FPGA seamlessly using shell and python scripts.

Currently the system is setup to stream the samples from the microphone array to the output of the audio codec. The microphones on the left side are summed up and output to the left channel, and the microphones on the right side are summed up and output to the right channel. The microphones are not processed before being sent to the CODEC. Here is a block diagram of what the system looks like before we add a DMA interface to the system.

Continue reading

Talking Heads

Within the Augmented Listening team, it has been my goal to develop Speech Simulators for testing purposes. These would be distributed around the environment in a sort of ‘Cocktail Party’ scenario.


Why use a Speech Simulator instead of human subjects?

Human Subjects can never say the same thing exactly the same way twice. By using anechoic recordings of people speaking played through speakers, we can remove the human error from the experiment. We can also simulate the user’s own voice captured by a wearable microphone array.


Why not just use normal Studio Monitors?

While studio monitors are designed to have a flat frequency response perfect for this situation, their off-axis performance is not consistent with that of the human voice. As most monitors use multiple drivers to achieve the desired frequency range, the dispersion is also inconsistent across the frequency range as it crosses between the drivers.

Continue reading

Using Notch for Low-Cost Motion Capture

This semester, I was fortunate to be able to toy around with a six-pack of Notch sensors and do some basic motion capture. Later in the semester, I was asked to do a basic comparison of existing motion capture technology that could be used for the tracking of microphone arrays.

Motion capture is necessary for certain projects in our lab because allows us to track the positions of multiple microphones in 3D space. When recording audio, the locations of the microphones are usually fixed, with known values for the difference in position. This known value allows us to determine the relative location of an audio source using triangulation.

For a moving microphone array, the position of each microphone (and the space between them) must be known in order to do correct localization calculations. Currently, our project lead Ryan Corey is using an ultrasonic localization system which requires heavy computing power and is not always accurate.

This segment of my projects is dedicated to determining the effectiveness of Notch for future use in the lab.

Continue reading

Constructing Microphone Arrays

Microphone arrays are powerful listening and recording devices composed of many individual microphones operating together in tandem. Many popular microphone arrays (such as the one found in the Amazon Echo) are arranged circularly, but they can be in any configuration the designer chooses. In our Augmented Listening Lab, we strive to make these arrays wearable to assist the hard of hearing or to serve recording needs. Over the past year, I have been constructing functional prototypes of microphone arrays using MEMS microphones and FPGAs.

Above is a MEMS microphone breakout board created by Adafruit. You can find it here:

When placing these microphones into an array, they all share the Bit Clock, Left/Right Clock, 3V and Ground signals. All of the microphones share the same clock! Pairs of microphones share one Data Out line that goes to our array processing unit (in our lab we use an FPGA) and the Select pin distinguishes left and right channels for each pair.

The first microphone array I constructed was using a construction helmet! The best microphone arrays leverage spatial area – the larger area the microphones surround or cover, the clearer the audio is. Sometimes in our lab, we test audio using microphone arrays placed on sombreros – a wide and spacious area. Another characteristic of good microphone array design is spacing the microphones evenly around the area. The construction helmet array I built had 12 microphones spaced around the outside on standoffs and I kept the wires on the inside.

Finally, we use a Field Programmable Gate Array (FPGA) to do real time processing on these microphone arrays. SystemVerilog makes it easy to build modules that control microphone pairs and channels. FPGAs are best used in situations where performance needs to be maximized, in this case we need to reduce latency as much as possible. In SystemVerilog we can build software for our specific application and declare the necessary constraints to make our array as responsive and efficient as possible.

My next goal was to create a microphone array prototype thats wearable and has greater aesthetic appeal than the construction helmet. My colleague, Uriah, designed a pair of black, over-the-ear headphones that contain up to 11 MEMS microphones. The first iteration of this design was breadboarded but future iterations will be cleaned up with a neat PCB design.

A pic of me wearing the breadboarded, over-the-ear headphone array.

Tutorial 1: Simple Accumulator


In this first tutorial we will create a design that will be able to increment or decrement a value by pushing buttons on the FPGA and displaying the hex value on the hex display on the board. This will cover combinational logic design, sequential logic design, and how to interface some of the peripherals on the FPGA as well as loading the design on the board.

Sequential vs parallel programming languages

Solving this problem with a microprocessor that executes instructions sequentially (like an Arduino) is pretty trivial to do. The code might look something like this…

/* PSEUDO CODE (will not compile on Arduino's IDE) */
   if (button1){
      accumulator = accumulator + 1;
      while (button1) {}
   } else if (button2){
      accumulator = accumulator - 1;
      while (button2) {}

Solving this problem in a hardware description language might not be so obvious. In SystemVerilog we are describing a digital circuit that will execute code in parallel. In order to achieve the same functionality as the sequential code from above we have to create a combinational and sequential logic circuit.

What is a combinational logic circuit?

A combinational logic circuit has a defined output based on all different combinations of its inputs – it’s kind of similar to a math function. Let’s consider an example of a familiar math function: f(x) = x^2. This function always has an output for any real value that is fed to this function: f(0) = 0, f(2) = 4, f(3) = 9, f(3.14159) = 9.8690, etc. In the digital world we can also have functions like this, but our inputs and outputs either have the value of 0 or 1. These functions are called boolean functions, and they are made up logic gates like AND, OR, NOT, NAND, etc. The link below covers the functionality of these logic gates with diagrams and truth tables.

We can piece these logic gates together to create a combinational logic circuit and represent it with a function. Let’s create an XOR (exclusive or) circuit using AND, OR, and NOT gates. We will first create a truth table for XOR.

x  y | z
0  0 | 0
0  1 | 1
1  0 | 1
1  1 | 0

This function’s output is 1 when X is 0 and Y is 1 (X’Y), OR when X is 1 and Y is 0 (XY’). If they are both 1 or both 0 the output is 0. We can write this function as z = x’y + xy’. The concatenation of two variables ‘xy’ represents the AND operation, the ‘ represents the NOT operation, and the ‘+’ represents the OR operation. Our combinational circuit looks like this.

That’s pretty neat, but how do we apply this to the problem we are trying to solve?

So now we know how to create a logic circuit, but how would we make a circuit that can increment and decrement a variable? We can start to think of what our inputs to this circuit would be: two different buttons on the FPGA – where one button will be the signal to increment and the other to decrement. We also need to think about how we can create something like a variable in hardware and how we can control what values it takes on. More specifically, we need to be able to have a way where we can remember what value our accumulator had and how our inputs to this circuit can control what values it takes on. To accomplish this we have to understand how sequential logic works.

What is sequential logic?

* Fill in detailed description *

Finite state machine design

Transferring your design into functional code

Quartus Tutorial

Actual FSM Behavior in hardware