简体   繁体   English

如何从Python中获取FFT的时间/频率

[英]How to get time/freq from FFT in Python

I've got a little problem managing FFT data. 我在管理FFT数据方面遇到了一些问题。 I was looking for many examples of how to do FFT, but I couldn't get what I want from any of them. 我正在寻找很多关于如何进行FFT的例子,但我无法从他们中任何一个得到我想要的东西。 I have a random wave file with 44kHz sample rate and I want to get magnitude of N harmonics each X ms, let's say 100ms should be enough. 我有一个44kHz采样率的随机波形文件,我希望每X ms得到N次谐波的幅度,比方说100ms就足够了。 I tried this code: 我试过这段代码:

import scipy.io.wavfile as wavfile
import numpy as np
import pylab as pl

rate, data = wavfile.read("sound.wav")
t = np.arange(len(data[:,0]))*1.0/rate
p = 20*np.log10(np.abs(np.fft.rfft(data[:2048, 0])))
f = np.linspace(0, rate/2.0, len(p))
pl.plot(f, p)
pl.xlabel("Frequency(Hz)")
pl.ylabel("Power(dB)")
pl.show()

This was last example I used, I found it somewhere on stackoverflow. 这是我使用的最后一个例子,我发现它在stackoverflow上的某个地方。 The problem is, this gets magnitude which I want, gets frequency, but no time at all. 问题是,这需要我想要的幅度,获得频率,但根本没有时间。 FFT analysis is 3D as far as I know and this is "merged" result of all harmonics. 据我所知,FFT分析是3D,这是所有谐波的“合并”结果。 I get this: 我明白了:

X-axis = Frequency, Y-axis = Magnitude, Z-axis = Time (invisible) X轴=频率,Y轴=幅度,Z轴=时间(不可见)

From my understanding of the code, t is time - and it seems like that, but is not needed in the code - We'll maybe need it though. 根据我对代码的理解,t是时间 - 似乎是这样,但代码中不需要 - 我们可能会需要它。 p is array of powers (or magnitude), but it seems like some average of all magnitudes of each frequency f, which is array of frequencies. p是功率(或幅度)的数组,但它似乎是每个频率f的所有幅度的平均值,即频率阵列。 I don't want average/merged value, I want magnitude for N harmonics each X milliseconds. 我不想要平均值/合并值,我想要每X毫秒的N次谐波幅度。

Long story short, we can get: 1 magnitude of all frequencies. 长话短说,我们可以得到:所有频率的1个数量级。

We want: All magnitudes of N freqeuencies including time when certain magnitude is present. 我们想要:所有N个频率的大小,包括存在一定幅度的时间。

Result should look like this array: [time,frequency,amplitude] So in the end if we want 3 harmonics, it would look like: 结果应该看起来像这个数组:[时间,频率,幅度]所以最后如果我们想要3个谐波,它看起来像:

[0,100,2.85489] #100Hz harmonic has 2.85489 amplitude on 0ms
[0,200,1.15695] #200Hz ...
[0,300,3.12215]
[100,100,1.22248] #100Hz harmonic has 1.22248 amplitude on 100ms
[100,200,1.58758]
[100,300,2.57578]
[200,100,5.16574]
[200,200,3.15267]
[200,300,0.89987]

Visualization is not needed, result should be just arrays (or hashes/dictionaries) as listed above. 不需要可视化,结果应该只是上面列出的数组(或散列/字典)。

Further to @Paul R's answer, scipy.signal.spectrogram is a spectrogram function in scipy's signal processing module . 继@Paul R的回答, scipy.signal.spectrogramscipy信号处理模块中频谱图功能

The example at the above link is as follows: 上述链接的示例如下:

from scipy import signal
import matplotlib.pyplot as plt

# Generate a test signal, a 2 Vrms sine wave whose frequency linearly
# changes with time from 1kHz to 2kHz, corrupted by 0.001 V**2/Hz of
# white noise sampled at 10 kHz.

fs = 10e3
N = 1e5
amp = 2 * np.sqrt(2)
noise_power = 0.001 * fs / 2
time = np.arange(N) / fs
freq = np.linspace(1e3, 2e3, N)
x = amp * np.sin(2*np.pi*freq*time)
x += np.random.normal(scale=np.sqrt(noise_power), size=time.shape)


#Compute and plot the spectrogram.

f, t, Sxx = signal.spectrogram(x, fs)
plt.pcolormesh(t, f, Sxx)
plt.ylabel('Frequency [Hz]')
plt.xlabel('Time [sec]')
plt.show()

在此输入图像描述

It looks like you're trying to implement a spectrogram , which is a sequence of power spectrum estimates, typically implemented with a succession of (usually overlapping) FFTs. 看起来您正在尝试实施频谱图 ,这是一系列功率谱估计,通常通过一系列(通常是重叠的)FFT实现。 Since you only have one FFT (spectrum) then you have no time dimension yet. 由于您只有一个FFT(频谱),因此您还没有时间维度。 Put your FFT code in a loop, and process one block of samples (eg 1024) per iteration, with a 50% overlap between successive blocks. 将FFT代码放在循环中,每次迭代处理一个样本块(例如1024),连续块之间重叠50%。 The sequence of generated spectra will then be a 3D array of time v frequency v magnitude. 然后,生成的光谱序列将是时间v频率v幅度的3D阵列。

I'm not a Python person, but I can give you some pseudo code which should be enough to get you coding: 我不是一个Python人,但我可以给你一些伪代码,它应该足以让你编码:

N = length of data input
N_FFT = no of samples per block (== FFT size, e.g. 1024)
i = 0 ;; i = index of spectrum within 3D output array
for block_start = 0 to N - block_start
    block_end = block_start + N_FFT
    get samples from block_start .. block_end
    apply window function to block (e.g. Hamming)
    apply FFT to windowed block
    calculate magnitude spectrum (20 * log10( re*re + im*im ))
    store spectrum in output array at index i
    block_start += N_FFT / 2            ;; NB: 50% overlap
    i++
 end

Edit: Oh, so it seems this returns values, but they don't fit to the audio file at all. 编辑:哦,所以它似乎返回值,但它们根本不适合音频文件。 Even though they can be used as magnitude on spectrogram, they won't work for example in those classic audio visualizers which you can see in many music players. 尽管它们可以用作频谱图上的幅度,但它们不适用于那些在许多音乐播放器中可以看到的经典视听器。 I also tried matplotlib's pylab for the spectrogram, but the result is same. 我也尝试了matplotlib的pylab用于谱图,但结果是一样的。

import os
import wave
import pylab
import math
from numpy import amax
from numpy import amin

def get_wav_info(wav_file,mi,mx):
    wav = wave.open(wav_file, 'r')
    frames = wav.readframes(-1)
    sound_info = pylab.fromstring(frames, 'Int16')
    frame_rate = wav.getframerate()
    wav.close()
    spectrum, freqs, t, im = pylab.specgram(sound_info, NFFT=1024, Fs=frame_rate)
    n = 0
    while n < 20:
        for index,power in enumerate(spectrum[n]):
            print("%s,%s,%s" % (n,int(round(t[index]*1000)),math.ceil(power*100)/100))
        n += 1

get_wav_info("wave.wav",1,20)

Any tips how to obtain dB that's usable in visualization? 有关如何获得可在可视化中使用的dB的提示吗? Basically, we apparently have all we need from the code above, just how to make it return normal values? 基本上,我们显然已经从上面的代码中获得了所有我们需要的,只是如何让它返回正常值? Ignore mi and mx as these are just adjusting values in array to fit into mi..mx interval - that would be for visualization usage. 忽略mimx因为这些只是调整数组中的值以适应mi..mx间隔 - 这将用于可视化用法。 If I am correct, spectrum in this code returns array of arrays which contains amplitudes for each frequency from freqs array, which are present on time according to t array, but how does the value work - is it really amplitude if it returns these weird values and if it is, how to convert it to dBs for example. 如果我是正确的,那么这段代码中的spectrum返回数组,这些数组包含来自freqs数组的每个频率的幅度,这些频率根据t数组按时出现,但该值如何工作 - 如果它返回这些奇怪的值,它是否真的是幅度如果是,如何将其转换为dBs例如。

tl;dr I need output for visualizer like music players have, but it shouldn't work realtime, I want just the data, but values don't fit the wav file. tl; dr我需要像音乐播放器一样的可视化器输出,但它不应该实时工作,我只想要数据,但值不适合wav文件。

Edit2: I noticed there's one more issue. Edit2:我注意到还有一个问题。 For 90 seconds wav, t array contains times till 175.x, which seems very weird considering the frame_rate is correct with the wav file. 对于90秒wav, t数组包含的时间直到175.x,考虑到frame_rate与wav文件是正确的,这看起来很奇怪。 So now we have 2 problems: spectrum doesn't seem to return correct values (maybe it will fit if we get correct time) and t seems to return exactly double time of the wav. 所以现在我们有两个问题: spectrum似乎没有返回正确的值(如果我们得到正确的时间它可能适合)并且t似乎返回wav的恰好两倍的时间。

Fixed: Case completely solved. 修正:案件完全解决了。

import os
import pylab
import math
from numpy import amax
from numpy import amin
from scipy.io import wavfile
frame_rate, snd = wavfile.read(wav_file)
sound_info = snd[:,0]
spectrum, freqs, t, im = pylab.specgram(sound_info,NFFT=1024,Fs=frame_rate,noverlap=5,mode='magnitude')

Specgram needed a little adjustment and I loaded only one channel with scipy.io library (instead of wave library). Specgram需要一点调整,我只加载了一个带scipy.io库的通道(而不是wave库)。 Also without mode set to magnitude, it returns 10log10 instead of 20log10, which is reason why it didn't return correct values. 同样没有模式设置为幅度,它返回10log10而不是20log10,这是它没有返回正确值的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM