Implementation of an Acoustic Sensor Array on a Mobile Robotic Device for Estimating Location of a Stationary Target 

Tripp McGehee

Supervisor: Professor Arye Nehorai

Department of Electrical and Systems Engineering
Washington University in St. Louis
Spring 2007

Abstract

 

We implemented an acoustic sensor array on a mobile robotic device for estimating the location of a stationary target. Our goal was to build a robot that could adaptively locate and move towards a stationary sound source. We mounted an array of four omnidirectional microphones with their respective sound cards on a Lego Mindstorm. The measurements were transmitted to a computer through USB port and processed using Labview. As a first approach, we estimated the time differences of arrivals of the sound wave reaching each of the microphones for estimating the direction of the acoustic source relative to the robot. Then, the robot transmitted the command via USB to rotate towards the estimated direction of the sound. We addressed two major technical issues while implementing this project: sensor calibration and simultaneous sampling using four independent sound cards. The results of our first experiment in finding the direction of the acoustic source are encouraging. However, a more precise sampling control is required to enable implementing successfully more sophisticated algorithms, such as maximum likelihood estimation.

 

Overview

 

The general overview of this process can be separated into a few steps as illustrated in Fig. 1:

Fig. 1. Schematic representation of the main stages of our project.

 

 

 

Sample a sound:

 

This stage is formed by three consecutive steps which are illustrated in Fig. 2:

Fig. 2. Main steps for sampling a sound wave using independent USB sounds cards.

 

Calibration: The microphones were not sampled simultaneously.  The time delay between the first samples of each microphone was observed to be between 0 msec and 60 msec.  The delay was caused by the way the computer sent the “start sampling” command to the microphones.  The command was sent to each microphone separately, so the delay depended on what other processes the computer had running at the time.  Fig. 3 illustrates the delay of acquisition between two sound cards which are fed simultaneously with the same waveform generated by the computer.

­

Fig. 3. Illustration of the processing delay between samples taken from two independent channels fed simultaneously with the same waveform.

 

 

To correct for the dynamic delay, a calibration signal was sent to each microphone port.  All four microphone ports were connected to a single headphone port with approximately the same length of speaker cable.  This was so the calibration signal would reach the microphone ports at the approximately the same time.  The delay between the calibration signal reach each microphone port was assumed to be negligible because it would be caused by minute differences in length and impedance of speaker cable.  This would produce delays that were insignificant when compared to the delays we were dealing with in this project. 

 

Once the calibration signal was sampled by all four microphones, an algorithm determined the delay between each microphone’s set of samples.  The sets did not always have the same number of samples.  This was due to the way the algorithm determined the beginning and ends of a sound and will be explained later. 

 

The algorithm will take the set of samples from the one microphone and compared it in turns to the other three.  In this way, we can find the delay of all the microphones’ data relative to the one microphone we choose as a reference.  The algorithm starts off by determining which sample set is shorter and which is longer.  Then it takes a subset of the longer sample set which begins at index 0 and is equal in length to the shorter sample set.  For example, if the shorter sample set was 1000 samples, the subset of the longer sample would start at index 0 and go to index 999.  Then it calculates the correlation coefficient of the short set compared to the subset of the longer sample set.  Then it shifts the subset by one sample, so it starts at index 1 and is still equal in length to the shorter sample set.  After calculating the correlation coefficient, it shifts the subset by one sample again.  It continues calculating correlation coefficients and shifting until the subset is at the end of the longer sample set.  The point at which the highest correlation coefficient is found is the point at which they best line up. Fig 4 illustrates the cross-correlation function between two channels.

 

 

After this shifting process is done, the difference between the starting index and the index at which the highest correlation coefficient was found on the longer sample set is the delay between the two sample sets.  This process is repeated until the delay is found for all microphones relative to the reference microphone.

 

The initial delays between each microphone starting to sample are constant throughout each sampling session.  This means once they are known, we can correct for them when finding arrival time delays.

 

 

Fig. 4. Illustration of the cross-correlation function between two channels and its delay estimation.

 

Find the Beginning of a Sound: After calibration, the next step is to look for the beginning of a sound. Noise measured by the microphone was assumed to be additive white Gaussian noise, and this was confirmed experimentally.  To distinguish between white Gaussian noise and a sound we would be interested in, a relatively quiet room was recorded for a length of time.  From a histogram of the samples, the standard deviation was computed.  We assumed that if a sample was greater than six standard deviations away from the mean, then it was not white Gaussian noise.  Therefore, to find the beginning of a sound, we checked every incoming sample versus a threshold which was over six standard deviations away from the mean. 

 

Find the End of a Sound: Since noise is assumed to be white Gaussian, a six standard deviation threshold can be used again.  If the samples fall below the threshold for a period of time, then we assumed that was the end of the sound. 

 

 

Signal Processing:

This stage is formed by three consecutive steps which are illustrated in Fig. 5:

 

Fig. 5. Schematic illustration of the main steps for estimating the sound location.

 

 

Sound Detection Error Check: The Sample a Sound process detailed in the above section happens independently for each microphone.  This means that one microphone can register a sound while the others do not register anything.  This can be caused by a number of reasons.  For example, if a person tapped one microphone, it would register a very strong signal, but none of the others would register a signal beyond normal noise.  This would be a faulty sound measurement, and the very strong signal should be discarded.

 

The correct signal detection case is when all the microphones detect a sound at approximately the same time and all the sounds detected are approximately the same length.  In order to determine if the correct case has been reached, we use the phases detailed under the Sample a Sound section. 

 

Find the Delays: Once all microphones have recorded a sound, an algorithm similar to the calibration algorithm is used.  A sliding correlation coefficient window finds the point at which the sounds best line up and finds the delay in arrival time between the two microphones. 

 

Find the Sound Location: Our experiment did not get to the point at which we were finding coordinates of the sound location, which would have been our next step. We were able to find a rough estimate of the direction of the source by using the estimated delays between two microphones. For instance, using only the estimated delay we could discriminate if the source comes from right or left. In addition, if we know a priori that the source is far from the array, then we can assume that the acoustic field at the sensor position can be described by a plane wave, so we can estimate its direction of arrival using the distance between sensors and sound speed, Vs. Fig. 6 illustrates a linear array of two microphones sensing a far acoustic field arriving from direction θ, where C(s1, s2) is the cross correlation between the measurements s1 and s2 taken from microphone 1 and microphone 2,  respectively.

 

 

Fig.6. Linear array formed by two microphones for estimating delay and angle of arrival of a far acoustic field.