**Implementation of an Acoustic Sensor Array on a
Mobile Robotic Device for Estimating Location
of a Stationary Target **

*Tripp McGehee*

*Supervisor: Professor Arye
Nehorai***
**

Department of Electrical and

Spring 2007

We implemented an acoustic sensor array on a mobile robotic
device for estimating the location of a stationary target. Our goal was to build
a robot that could adaptively locate and move towards a stationary sound source.
We mounted an array of four omnidirectional
microphones with their respective sound cards on a Lego Mindstorm. The measurements were transmitted to a computer
through USB port and processed using Labview. As a
first approach, we estimated the time differences of arrivals of the sound wave
reaching each of the microphones for estimating the direction of the acoustic
source relative to the robot. Then, the robot transmitted the command via USB to
rotate towards the estimated direction of the sound. We addressed two major
technical issues while implementing this project: sensor calibration and
simultaneous sampling using four independent sound cards. The results of our
first experiment in finding the direction of the acoustic source are
encouraging. However, a more precise sampling control is required to enable
implementing successfully more sophisticated algorithms, such as maximum
likelihood estimation.

The general overview of this process can be separated into a few steps as illustrated in Fig. 1:

Fig. 1. Schematic representation of the main stages of our project.

__Sample a sound:__

This stage is formed by three consecutive steps which are illustrated in Fig. 2:

Fig. 2. Main steps for sampling a sound wave using independent USB sounds cards.

**Calibration: **The
microphones were not sampled simultaneously. The time delay between the
first samples of each microphone was observed to be between 0 msec and 60 msec. The delay was caused by the way the
computer sent the “start sampling” command to the microphones. The command
was sent to each microphone separately, so the delay depended on what other
processes the computer had running at the time. Fig. 3 illustrates the
delay of acquisition between two sound cards which are fed simultaneously with
the same waveform generated by the computer.

_{}

Fig. 3. Illustration of the processing delay between samples taken from two independent channels fed simultaneously with the same waveform.

_{}

To correct for the dynamic delay, a calibration signal was sent to each microphone port. All four microphone ports were connected to a single headphone port with approximately the same length of speaker cable. This was so the calibration signal would reach the microphone ports at the approximately the same time. The delay between the calibration signal reach each microphone port was assumed to be negligible because it would be caused by minute differences in length and impedance of speaker cable. This would produce delays that were insignificant when compared to the delays we were dealing with in this project.

Once the calibration signal was sampled by all four microphones, an algorithm determined the delay between each microphone’s set of samples. The sets did not always have the same number of samples. This was due to the way the algorithm determined the beginning and ends of a sound and will be explained later.

The algorithm will take the set of samples from the one microphone and compared it in turns to the other three. In this way, we can find the delay of all the microphones’ data relative to the one microphone we choose as a reference. The algorithm starts off by determining which sample set is shorter and which is longer. Then it takes a subset of the longer sample set which begins at index 0 and is equal in length to the shorter sample set. For example, if the shorter sample set was 1000 samples, the subset of the longer sample would start at index 0 and go to index 999. Then it calculates the correlation coefficient of the short set compared to the subset of the longer sample set. Then it shifts the subset by one sample, so it starts at index 1 and is still equal in length to the shorter sample set. After calculating the correlation coefficient, it shifts the subset by one sample again. It continues calculating correlation coefficients and shifting until the subset is at the end of the longer sample set. The point at which the highest correlation coefficient is found is the point at which they best line up. Fig 4 illustrates the cross-correlation function between two channels.

After this shifting process is done, the difference between the starting index and the index at which the highest correlation coefficient was found on the longer sample set is the delay between the two sample sets. This process is repeated until the delay is found for all microphones relative to the reference microphone.

The initial delays between each microphone starting to sample are constant throughout each sampling session. This means once they are known, we can correct for them when finding arrival time delays.

Fig. 4. Illustration of the cross-correlation function between two channels and its delay estimation.

**Find the Beginning of a
Sound:** After calibration, the next step is to look for the beginning of a
sound. Noise measured by the microphone was assumed to be additive white
Gaussian noise, and this was confirmed experimentally. To distinguish
between white Gaussian noise and a sound we would be interested in, a relatively
quiet room was recorded for a length of time. From a histogram of the
samples, the standard deviation was computed. We assumed that if a sample
was greater than six standard deviations away from the mean, then it was not
white Gaussian noise. Therefore, to find the beginning of a sound, we
checked every incoming sample versus a threshold which was over six standard
deviations away from the mean.

**Find the End of a Sound:
**Since noise is assumed to be white Gaussian, a six standard deviation
threshold can be used again. If the samples fall below the threshold for a
period of time, then we assumed that was the end of the sound.

** **

__Signal Processing:__

This stage is formed by three consecutive steps which are illustrated in Fig. 5:

Fig. 5. Schematic illustration of the main steps for estimating the sound location.

**Sound Detection Error Check:
**The Sample a Sound process detailed in the above section happens
independently for each microphone. This means that one microphone can
register a sound while the others do not register anything. This can be
caused by a number of reasons. For example, if a person tapped one
microphone, it would register a very strong signal, but none of the others would
register a signal beyond normal noise. This would be a faulty sound
measurement, and the very strong signal should be discarded.

The correct signal detection case is when all the microphones detect a sound at approximately the same time and all the sounds detected are approximately the same length. In order to determine if the correct case has been reached, we use the phases detailed under the Sample a Sound section.

**Find the Delays: **Once all
microphones have recorded a sound, an algorithm similar to the calibration
algorithm is used. A sliding correlation coefficient window finds the
point at which the sounds best line up and finds the delay in arrival time
between the two microphones.

**Find the Sound Location:
**Our experiment did not get to the point at which we were finding coordinates
of the sound location, which would have been our next step. We were able to find
a rough estimate of the direction of the source by using the estimated delays
between two microphones. For instance, using only the estimated delay we could
discriminate if the source comes from right or left. In addition, if we know a
priori that the source is far from the array, then we can assume that the
acoustic field at the sensor position can be described by a plane wave, so we
can estimate its direction of arrival using the distance between sensors and
sound speed, Vs. Fig. 6 illustrates a linear array of two microphones sensing a
far acoustic field arriving from direction θ, where C(**s**_{1},
**s**_{2}) is the cross correlation between the measurements
**s**_{1} and **s**_{2} taken from microphone 1 and
microphone 2, respectively.

Fig.6. Linear array formed by two microphones for estimating delay and angle of arrival of a far acoustic field.