这是读取音频文件FFT的正确方法吗？（python + wav）

Question

The audio file is a 16bit mono PCM audio file with varying samplerates and length of 10-30ms. 音频文件是16位单声道PCM音频文件，具有不同的采样率，长度为10-30ms。

import struct
from pydub import AudioSegment
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack

sound = AudioSegment.from_wav("3000hz.wav")

raw_data = sound.raw_data# needs to be mono
sample_rate = sound.frame_rate
sample_size = sound.sample_width
channels = sound.channels

fmt = "%ih" % sound.frame_count() * channels
amplitudes= struct.unpack(fmt, raw_data)
yVals = scipy.fftpack.fft(amplitudes)

plt.plot(abs(yVals[:(len(yVals)/2)-1]),'r')
plt.show()

The output with a 3000hz wav file(taken from an online sin wave generator) results in a decent looking FFT but spikes at 9000, not 3000. This off by a factor of 3 is consistent in other tests. 带有3000hz wav文件的输出（来自在线正弦波发生器）产生了不错的FFT，但峰值为9000，而不是3000。在其他测试中，相差3倍是一致的。 Is this ok? 这个可以吗？ And is the code correct? 代码正确吗？

Answer 1

By calling plt.plot() with only an y array and no corresponding x array, it will use 0, 1, ..., N-1 as the x values. 通过仅使用y数组而没有对应的x数组调用plt.plot() ，它将使用0, 1, ..., N-1作为x值。 This is not what we actually want, we want the frequency on the x-axis. 这不是我们真正想要的，我们想要x轴上的频率。

Let's denote the x value you see in the plot right now by "bin index". 让我们用“ bin index”表示您现在在图中看到的x值。 Let the length of the array be N and the sampling frequency be fs . 假设数组的长度为N ，采样频率为fs 。 When calculating an FFT, the bin index 0 corresponds to a frequency of 0 Hz. 在计算FFT时，bin索引0对应于0 Hz的频率。 The next bin index 1 corresponds to the frequency fs / N Hz. 下一个二进制索引1对应于频率fs / N Hz。 This is because the FFT will have N values and go from 0 Hz to fs Hz, so each step is fs / N Hz. 这是因为FFT将具有N值，并且从0 Hz到fs Hz，因此每个步都是fs / N Hz。 The next bin thus corresponds to 2 * fs / N Hz, and so on. 因此，下一个bin对应于2 * fs / N Hz，依此类推。 And the last bin N-1 is (N-1)/N * fs Hz, so almost fs Hz. 而最后一个N-1箱是(N-1)/N * fs Hz，所以几乎是fs Hz。

If we want to create a plot where you have amplitude spectrum vs. frequency, then we need to manually create a frequency vector which contains the real frequency for each bin index. 如果要创建一个振幅谱与频率关系图，则需要手动创建一个频率矢量，其中包含每个仓位索引的实际频率。 Luckily, scipy.fftpack contains a function for that: fftfreq : 幸运的是， scipy.fftpack包含了以下功能： fftfreq ：

freq = scipy.fftpack.fftfreq(n=N, d=1.0 / fs)

Then we can modify the call to plt.plot() to use freq as the x values instead of 0 ... N-1 : 然后，我们可以修改对plt.plot()的调用，以将freq用作x值而不是0 ... N-1 ：

plt.plot(freq, abs(yVals), 'r')

With that, the peak should be at the correct position. 这样，峰值应该在正确的位置。

If you only want to see a single-sided spectrum, then you can crop both freq and yVals like you already do in the code in the question. 如果您只想查看单面频谱，则可以像在问题代码中已经yVals那样对freq和yVals进行裁剪。

这是读取音频文件FFT的正确方法吗？（python + wav）

问题描述

1 个解决方案

解决方案1
1 2019-02-06 07:03:37

这是读取音频文件FFT的正确方法吗？ （python + wav）

问题描述

1 个解决方案

解决方案1 1 2019-02-06 07:03:37

这是读取音频文件FFT的正确方法吗？（python + wav）

解决方案1
1 2019-02-06 07:03:37