Within the Augmented Listening team, it has been my goal to develop Speech Simulators for testing purposes. These would be distributed around the environment in a sort of ‘Cocktail Party’ scenario.
Why use a Speech Simulator instead of human subjects?
Human Subjects can never say the same thing exactly the same way twice. By using anechoic recordings of people speaking played through speakers, we can remove the human error from the experiment. We can also simulate the user’s own voice captured by a wearable microphone array.
Why not just use normal Studio Monitors?
While studio monitors are designed to have a flat frequency response perfect for this situation, their off-axis performance is not consistent with that of the human voice. As most monitors use multiple drivers to achieve the desired frequency range, the dispersion is also inconsistent across the frequency range as it crosses between the drivers.
What is the solution?
A single driver system capable of covering the entire frequency range of speech with sufficient output. The goal for this project is to create plans for a loudspeaker capable of accurately recreating the directionality of human speech across the entire vocal spectrum.
The goal for the first iteration of the speech simulator was to find a driver small enough to have desired off axis response while still maintaining the ability to reproduce the entire vocal range, as well as the harmonics, fricative sounds like ‘f’ that can extend past 5 kHz, and sibilant sounds like the hissing ‘s’, reaching up to an octave above that. Sounds in this range are the most important for speech intelligibility.
After comparing the published specifications of several drivers, the Tectonic TEBM35C10-4 seemed to be the best solution for this application. Its small size combined with a flat diaphragm enable it to have excellent off axis response. Besides a similar polar response to that of the human voice, this particular driver has a fairly smooth frequency response, dropping only 4 dB across its entire usable bandwidth. Despite its size, it can still reach down to a solid 80hz (typically the low end of a bass voice range) with an output of 90 dB at 1 meter.
When testing microphones on proxy heads made of different materials such as plastic, foam, and concrete, the latter yielded results most similar to the measurements taken on an actual human head. Because of this, it was important to find a similarly dense material for the construction of the speech simulator. To create the form, several methods were considered. Casting was an immediate option. However the task of creating a two part mold, then a cast with the correct cavities for the speaker seemed too intensive with too much room for error. Because a dense material was desired, carving would not be an option that would fit into my time restraints. It was decided to use a CNC machine to cut out layers of the head that could then be assembled. Three Quarter plywood was chosen as the most cost effective material, with a 4×8 sheet costing around $20.
A cavity (70 cubic inches) for the speaker was cut individually into the layers of wood and a face plate was attached with gasket sealer. A 3.5 x .75 inch port was drilled through the bottom of the head, yielding a tuning frequency of approximately 80 hz (as calculated on WinISDPro)
While the initial Speech Simulator did yield better results than the commercial speaker across almost the entirety of the audio spectrum, it still did not quite measure up to the dispersion characteristics of the human voice (-10 db 90 degrees off axis at 8khz.) On its own, there was no single speaker that could perform any better for this application. A new approach had to be taken.
The two most popular products currently on the market (Bruel & Kjaer Type 4227 Mouth Simulator and Type 4128-C Head and Torso Simulator) use waveguides to achieve better dispersion.
Beyond exterior product photos, there is little information on the specification of the simulators or the type of waveguides utilized. Going off of the pictures I created two waveguides to test, along with one other variation I anticipated would also have a chance at yielding a decent response. Waveguide No. 1 was modeled off my assumptions of the 4227. It features a front chamber. Waveguide No. 2 swapped the front chamber for a longer throat, tapering down from the driver. This is what appears to be used in the 4128-C. For the 3rd waveguide, I chose an approach resembling the phase lens found on many dome tweeters and compression drivers. The lens controls destructive interference at higher frequencies that occurs when the wavelength of the frequency becomes shorter than the diameter of the driver. This interference is what causes a drop in off-axis response.
Second Iteration Testing
Based just on listening, it became obvious that the front chamber of Waveguide No. 1 produced a bandpass effect, with a large beak in the passband followed by an attenuation of higher frequencies. Waveguide No. 2 reduced the output, as expected by its reverse-horn shape. Waveguide No. 3 made little audible difference to my ear.
The measurements reflected the bandpass effect of Waveguide No. 1. Waveguide No. 2 had less of an overall effect on the sound, but had slightly worse off axis performance. Waveguide No. 3 was the most successful of the three. At 8khz 90° off axis, it was down only about 8 db from on axis. This performs almost too well, as the voice is about 10 db down. This waveguide also caused a peak in response at around 4khz. This is mostly like due to the horn effect caused by the shape of the waveguide.