Python随时间查找音频和振幅

Question

Here is what I would like to do. 这是我想做的。 I would like to find the audio frequency and amplitude of a .wav file at every say 1ms of that .wav file and save it into a file. 我想在.wav文件的每个1ms处找到.wav文件的音频和幅度，并将其保存到文件中。 I have graphed frequency vs amplitude and have graphed amplitude over time but I cannot figure out frequency overtime. 我已经绘制了频率与幅度的图表，并且随着时间的推移绘制了幅度，但我无法计算频率超时。 My end goal is to be able to read the file and use them amplitude to adjust variables and the frequency to trigger which variables are being used, that seems to be the easy part. 我的最终目标是能够读取文件并使用它们调整变量和频率以触发正在使用的变量，这似乎是最简单的部分。 I have been using numpy, audiolab, matplotlib, etc... using FFT's but I just cannot figure this one out, any help is appreciated! 我一直在使用numpy，audiolab，matplotlib等...使用FFT，但我只是想不出这个，任何帮助表示赞赏！ Thank You! 谢谢！

Answer 1

Use a STFT with overlapping windows to estimate the spectrogram. 使用具有重叠窗口的STFT来估计频谱图。 To save yourself the trouble of rolling your own, you can use the specgram method of Matplotlib's mlab. 为了省去自己滚动的麻烦，可以使用Matplotlib的mlab的specgram 方法。 It's important to use a small enough window for which the audio is approximately stationary, and the buffer size should be a power of 2 to efficiently use a common radix-2 fft. 使用一个足够小的窗口是很重要的，音频大约是静止的，缓冲区大小应该是2的幂，以有效地使用常见的基数-2 fft。 512 samples (about 10.67 ms at 48 ksps; or 93.75 Hz per bin) should suffice. 512个样本（48 ksps时约10.67 ms;或每个bin 93.75 Hz）就足够了。 For a sampling rate of 48 ksps, overlap by 464 samples to evaluate a sliding window at every 1 ms (ie shift by 48 samples). 对于48 ksps的采样率，重叠464个样本以评估每1 ms的滑动窗口（即，移位48个样本）。

Edit: 编辑：

Here's an example that uses mlab.specgram on an 8-second signal that has 1 tone per second from 2 kHz up to 16 kHz. 这是一个在8秒信号上使用mlab.specgram的示例，该信号从2 kHz到16 kHz每秒有1个音调。 Note the response at the transients. 注意瞬态响应。 I've zoomed in at 4 seconds to show the response in more detail. 我在4秒内放大了以更详细地显示响应。 The frequency shifts at precisely 4 seconds, but it takes a buffer length (512 samples; approx +/- 5 ms) for the transient to pass. 频率精确地移动4秒，但是瞬态通过需要缓冲长度（512个样本;大约+/- 5 ms）。 This illustrates the kind of spectral/temporal smearing caused by non-stationary transitions as they pass through the buffer. 这说明了当非平稳过渡通过缓冲区时由非平稳过渡引起的光谱/时间模糊。 Additionally, you can see that even when the signal is stationary there's the problem of spectral leakage caused by windowing the data. 此外，您可以看到，即使信号静止，也会出现因数据加窗而导致频谱泄漏的问题。 A Hamming window function was used to minimize the side lobes of the leakage, but this also widens the main lobe. 汉明窗函数用于最小化泄漏的旁瓣，但这也扩大了主瓣。

import numpy as np
from matplotlib import mlab, pyplot

#Python 2.x:
#from __future__ import division

Fs = 48000
N = 512
f = np.arange(1, 9) * 2000
t = np.arange(8 * Fs) / Fs 
x = np.empty(t.shape)
for i in range(8):
    x[i*Fs:(i+1)*Fs] = np.cos(2*np.pi * f[i] * t[i*Fs:(i+1)*Fs])

w = np.hamming(N)
ov = N - Fs // 1000 # e.g. 512 - 48000 // 1000 == 464
Pxx, freqs, bins = mlab.specgram(x, NFFT=N, Fs=Fs, window=w, 
                                 noverlap=ov)

#plot the spectrogram in dB

Pxx_dB = np.log10(Pxx)
pyplot.subplots_adjust(hspace=0.4)

pyplot.subplot(211)
ex1 = bins[0], bins[-1], freqs[0], freqs[-1]
pyplot.imshow(np.flipud(Pxx_dB), extent=ex1)
pyplot.axis('auto')
pyplot.axis(ex1)
pyplot.xlabel('time (s)')
pyplot.ylabel('freq (Hz)')

#zoom in at t=4s to show transient

pyplot.subplot(212)
n1, n2 = int(3.991/8*len(bins)), int(4.009/8*len(bins))
ex2 = bins[n1], bins[n2], freqs[0], freqs[-1]
pyplot.imshow(np.flipud(Pxx_dB[:,n1:n2]), extent=ex2)
pyplot.axis('auto')
pyplot.axis(ex2)
pyplot.xlabel('time (s)')
pyplot.ylabel('freq (Hz)')

pyplot.show()

Python随时间查找音频和振幅

问题描述

1 个解决方案

解决方案1
7 2011-08-07 07:03:00

Python随时间查找音频和振幅

问题描述

1 个解决方案

解决方案1 7 2011-08-07 07:03:00

解决方案1
7 2011-08-07 07:03:00