简体   繁体   中英

Head related impulse response for binaural audio

I am working with audio digital signal processing and binaural audio processing. I am still learning the basics. Right now, the idea is to do deconvolution and get an impulse response.

Please see the attached screenshot在此处输入图像描述

Detailed description of what is happening:

Here, an exponential sweep signal is taken and played back back through loudspeaker. The playback is recorded using microphone. The recorded signal is extended using zero padding(probably double the original length) and the original exponential sweep signal is also extended as well. FFTs are taken for both (extended recorded and the extended original), their FFT's are divided and we get room transfer function. Finally,Inverse FFT is taken and some windowing is performed to get Impulse response.

My question:

I am having difficulty implementing this diagram in python. How would you divide two FFT's? Is it possible? I can probably do all steps like zero padding and fft's, but I guess I am not going the correct way. I do not understand the windowing and discarding second half option.

Please can anyone with his/her knowledge show me how would I implement this in python with sweep signal? Just a small example would also help to get an idea with few plots. Please help.

Source of this image: http://www.four-audio.com/data/MF/aes-swp-english.pdf

Thanks in advance, Sanket Jain

This is a little over my head, but maybe the following bits of advice can help.

First, I came across a very helpful amount of sample code presented in Steve Smith's book The Scientist and Engineer's Guide to Digital Signal Processing . This includes a range operations, from basics of convolution to the FFT algorithm itself. The sample code is in BASIC, not Python. But the BASIC is perfectly readable, and should be easy to translate.

I'm not entirely sure about the specific calculation you describe, but many operations in this realm (when dealing with multiple signals) turn out to simply employ addition or subtraction of constituent elements. To get an authoritative answer, I think you will have better luck at Stack Overflow's Signal Processing forum or at one of the forums at DSP Related .

If you do get an answer elsewhere, it might be good to either recap it here or delete this question entirely to reduce clutter.

Yes, deviding two FFT-spectra is possible and actually quite easy to implement in python (but with some caveats). Simply said: As convolution of two time signal corresponds to multiplying their spectra, vice versa the deconvolution can be realized by dividing the spectra.

Here is an example for a simple deconvolution with numpy:

( x is your excitation sweep signal and y is the recorded sweep signal, from which you want to obtain the impulse response.)

import numpy as np
from numpy.fft import rfft, irfft

# define length of FFT (zero padding): at least double length of input
input_length = np.size(x)
n = np.ceil(np.log2(input_length)) + 1
N_fft = int(pow(2, n))

# transform 
# real fft: N real input -> N/2+1 complex output (single sided spectrum)
# real ifft: N/2+1 complex input -> N real output
X_f = rfft(x, N_fft)
Y_f = rfft(x, N_fft)

# deconvolve
H = Y_f / X_f

# backward transform
h = irfft(H, N_fft)

# truncate to original length
h = h[:input_length]

This simple solution is a practical one but can (and should be) be improved. A problem is that you will get a boost of the noise floor at those frequencies where X_f has a low amplitude. For example if your exponential sine sweep starts at 100Hz, for the frequency bins below that frequency, you get a division of (almost) zero. One simple possible solution to that is to first invert X_f, apply a bandlimit filter (highpass+lowpass) to remove the "boost areas" and then multiply it with Y_f:

# deconvolve
Xinv_f = 1 / X_f
Xinv_f = Xinv_f * bandlimit_filter
H = Y_f * Xinv_f

Regarding the distortion : A nice property of the exponential sine sweep is that harmonic distortion production during the measurement (eg by nonlinearities in the loudpspeaker) will produce smaller "side" responses before the "main" response after deconvolution (see this for more details). These side responses are the distortion products and can be simply removed by a time window. If there is no delay of the "main" response (starts at t=0), those side responses will appear at the end of the whole iFFT, so you remove them by windowing out the second half.

I cannot guarantee that this is 100% correct from a signal-theory point of view, but I think it shows the point and it works;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM