Although source separation (separating distinct and overlapping sound sources from each other and from dispersed noise) in a small, quiet lab with only a few speakers usually produces excellent results, such a situation may not always be present. In a large reverberant room with many speakers, for example, it may be difficult for a person or speech recognition system to keep track of and to comprehend what one particular speaker is saying. But using source separation in such a scenario to improve intelligibility is quite difficult without having external information that in itself may also be difficult to obtain.
Many source separation methods work well up to only a certain number of speakers – typically not much more than four or five. Moreover, some of these methods rely on constraining the number of microphones to an amount equal to the number of sources, and will scale poorly in terms of results, if at all, with the addition of more microphones. Limiting the number of microphones to the number of speakers will not work in these difficult scenarios, but adding more microphones may help, due to the greater amount of spatial information made available by the additional microphones. Therefore, the motivation behind this experiment was to find a suitable source separation method, or series of such methods, that could leverage the “massive” number of microphones used in the Massive Distributed Microphone Array Dataset and solve the particularly challenging problem of separating ten speech sources. Ideally, the algorithm would rely on as little external information as possible, instead relying on the wealth of information gathered by the microphone arrays distributed around the conference room. Thus, the delay-and-sum beamformer was first considered for this task, because it only requires the locations of each source and microphone, and it inherently scales well with a large number of microphones.
Shown above is a diagram depicting the setup for the Massive Distributed Microphone Array Dataset. Of note is the fact that there are two distinct types of arrays – wearable arrays, denoted by the letter W and numbered from 1-4, and tabletop arrays, denoted by the letter T and numbered from 1-12. Wearable arrays have 16 microphones each, whereas tabletop arrays have 8.
Face masks are a critical tool in slowing the spread of COVID-19, but they also make communication more difficult, especially for people with hearing loss. Face masks muffle high-frequency speech sounds and block visual cues. Masks may be especially frustrating for teachers, who will be expected to wear them while teaching in person this fall. Fortunately, not all masks affect sound in the same way, and some masks are better than others. Our research team measured several face masks in the Illinois Augmented Listening Laboratory to find out which are the best for sound transmission, and to see whether amplification technology can help.
Several months into the pandemic, we now have access to a wide variety of face masks, including disposable medical masks and washable cloth masks in different shapes and fabrics. A recent trend is cloth masks with clear windows, which let listeners see the talker’s lips and facial expressions. In this study, we tested a surgical mask, N95 and KN95 respirators, several types of opaque cloth mask, two cloth masks with clear windows, and a plastic face shield.
We measured the masks in two ways. First, we used a head-shaped loudspeaker designed by recent Industrial Design graduate Uriah Jones to measure sound transmission through the masks. A microphone was placed at head height about six feet away, to simulate a “social-distancing” listener. The loudspeaker was rotated on a turntable to measure the directional effects of the mask. Second, we recorded a human talker wearing each mask, which provides more realistic but less consistent data. The human talker wore extra microphones on his lapel, cheek, forehead, and in front of his mouth to test the effects of masks on sound capture systems.