简体   繁体   English

在 Python 中绘制快速傅立叶变换

[英]Plotting a fast Fourier transform in Python

I have access to NumPy and SciPy and want to create a simple FFT of a data set.我可以访问 NumPy 和 SciPy,并且想要创建一个简单的数据集 FFT。 I have two lists, one that is y values and the other is timestamps for those y values.我有两个列表,一个是y值,另一个是那些y值的时间戳。

What is the simplest way to feed these lists into a SciPy or NumPy method and plot the resulting FFT?将这些列表输入 SciPy 或 NumPy 方法并绘制结果 FFT 的最简单方法是什么?

I have looked up examples, but they all rely on creating a set of fake data with some certain number of data points, and frequency, etc. and don't really show how to do it with just a set of data and the corresponding timestamps.我查了一些例子,但它们都依赖于创建一组具有一定数量的数据点和频率等的假数据,并没有真正展示如何只用一组数据和相应的时间戳来做到这一点.

I have tried the following example:我尝试了以下示例:

from scipy.fftpack import fft

# Number of samplepoints
N = 600

# Sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)
import matplotlib.pyplot as plt
plt.plot(xf, 2.0/N * np.abs(yf[0:N/2]))
plt.grid()
plt.show()

But when I change the argument of fft to my data set and plot it, I get extremely odd results, and it appears the scaling for the frequency may be off.但是当我将fft的参数更改为我的数据集并绘制它时,我得到了非常奇怪的结果,并且频率的缩放似乎可能关闭。 I am unsure.我不确定。

Here is a pastebin of the data I am attempting to FFT这是我尝试 FFT 的数据的粘贴箱

http://pastebin.com/0WhjjMkb http://pastebin.com/ksM4FvZS http://pastebin.com/0WhjjMkb http://pastebin.com/ksM4FvZS

When I use fft() on the whole thing it just has a huge spike at zero and nothing else.当我在整个事情上使用fft()时,它只是在零处有一个巨大的尖峰,没有别的。

Here is my code:这是我的代码:

## Perform FFT with SciPy
signalFFT = fft(yInterp)

## Get power spectral density
signalPSD = np.abs(signalFFT) ** 2

## Get frequencies corresponding to signal PSD
fftFreq = fftfreq(len(signalPSD), spacing)

## Get positive half of frequencies
i = fftfreq>0

##
plt.figurefigsize = (8, 4));
plt.plot(fftFreq[i], 10*np.log10(signalPSD[i]));
#plt.xlim(0, 100);
plt.xlabel('Frequency [Hz]');
plt.ylabel('PSD [dB]')

Spacing is just equal to xInterp[1]-xInterp[0] .间距刚好等于xInterp[1]-xInterp[0]

So I run a functionally equivalent form of your code in an IPython notebook:因此,我在 IPython 笔记本中运行功能等效形式的代码:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack

# Number of samplepoints
N = 600
# sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)

fig, ax = plt.subplots()
ax.plot(xf, 2.0/N * np.abs(yf[:N//2]))
plt.show()

I get what I believe to be very reasonable output.我得到了我认为非常合理的输出。

在此处输入图片说明

It's been longer than I care to admit since I was in engineering school thinking about signal processing, but spikes at 50 and 80 are exactly what I would expect.自从我在工程学校考虑信号处理以来,它的时间比我想承认的要长,但 50 和 80 的峰值正是我所期望的。 So what's the issue?那么问题是什么?

In response to the raw data and comments being posted回应发布的原始数据和评论

The problem here is that you don't have periodic data.这里的问题是您没有定期数据。 You should always inspect the data that you feed into any algorithm to make sure that it's appropriate.您应该始终检查您输入任何算法的数据,以确保它是合适的。

import pandas
import matplotlib.pyplot as plt
#import seaborn
%matplotlib inline

# the OP's data
x = pandas.read_csv('http://pastebin.com/raw.php?i=ksM4FvZS', skiprows=2, header=None).values
y = pandas.read_csv('http://pastebin.com/raw.php?i=0WhjjMkb', skiprows=2, header=None).values
fig, ax = plt.subplots()
ax.plot(x, y)

在此处输入图片说明

The important thing about fft is that it can only be applied to data in which the timestamp is uniform ( ie uniform sampling in time, like what you have shown above).关于 fft 的重要一点是它只能应用于时间戳是统一的数据(时间上的统一采样,就像你上面展示的那样)。

In case of non-uniform sampling, please use a function for fitting the data.如果采样不均匀,请使用拟合数据的函数。 There are several tutorials and functions to choose from:有几个教程和功能可供选择:

https://github.com/tiagopereira/python_tips/wiki/Scipy%3A-curve-fitting http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html https://github.com/tiagopereira/python_tips/wiki/Scipy%3A-curve-fit http://docs.scipy.org/doc/numpy/reference/generated/numpy.polyfit.html

If fitting is not an option, you can directly use some form of interpolation to interpolate data to a uniform sampling:如果拟合不是一个选项,您可以直接使用某种形式的插值将数据插值到统一采样:

https://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html https://docs.scipy.org/doc/scipy-0.14.0/reference/tutorial/interpolate.html

When you have uniform samples, you will only have to wory about the time delta ( t[1] - t[0] ) of your samples.当您拥有统一样本时,您只需担心样本的时间增量 ( t[1] - t[0] )。 In this case, you can directly use the fft functions在这种情况下,您可以直接使用 fft 函数

Y    = numpy.fft.fft(y)
freq = numpy.fft.fftfreq(len(y), t[1] - t[0])

pylab.figure()
pylab.plot( freq, numpy.abs(Y) )
pylab.figure()
pylab.plot(freq, numpy.angle(Y) )
pylab.show()

This should solve your problem.这应该可以解决您的问题。

The high spike that you have is due to the DC (non-varying, ie freq = 0) portion of your signal.您拥有的高尖峰是由于信号的 DC(不变,即 freq = 0)部分造成的。 It's an issue of scale.这是规模问题。 If you want to see non-DC frequency content, for visualization, you may need to plot from the offset 1 not from offset 0 of the FFT of the signal.如果您想查看非 DC 频率内容,为了可视化,您可能需要从信号 FFT 的偏移 1 而非偏移 0 进行绘图。

Modifying the example given above by @PaulH修改@PaulH 上面给出的例子

import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack

# Number of samplepoints
N = 600
# sample spacing
T = 1.0 / 800.0
x = np.linspace(0.0, N*T, N)
y = 10 + np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = scipy.fftpack.fft(y)
xf = np.linspace(0.0, 1.0/(2.0*T), N/2)

plt.subplot(2, 1, 1)
plt.plot(xf, 2.0/N * np.abs(yf[0:N/2]))
plt.subplot(2, 1, 2)
plt.plot(xf[1:], 2.0/N * np.abs(yf[0:N/2])[1:])

The output plots:输出图: 用 DC 绘制 FFT 信号,然后在删除它时(跳过频率 = 0)

Another way, is to visualize the data in log scale:另一种方法是以对数比例可视化数据:

Using:使用:

plt.semilogy(xf, 2.0/N * np.abs(yf[0:N/2]))

Will show:将会呈现:在此处输入图片说明

Just as a complement to the answers already given, I would like to point out that often it is important to play with the size of the bins for the FFT.作为对已经给出的答案的补充,我想指出,通常使用 FFT 的 bin 大小很重要。 It would make sense to test a bunch of values and pick the one that makes more sense to your application.测试一堆值并选择对您的应用程序更有意义的值是有意义的。 Often, it is in the same magnitude of the number of samples.通常,它与样本数量的数量级相同。 This was as assumed by most of the answers given, and produces great and reasonable results.这是大多数给出的答案所假设的,并产生了很好且合理的结果。 In case one wants to explore that, here is my code version:如果有人想探索这一点,这是我的代码版本:

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack

fig = plt.figure(figsize=[14,4])
N = 600           # Number of samplepoints
Fs = 800.0
T = 1.0 / Fs      # N_samps*T (#samples x sample period) is the sample spacing.
N_fft = 80        # Number of bins (chooses granularity)
x = np.linspace(0, N*T, N)     # the interval
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)   # the signal

# removing the mean of the signal
mean_removed = np.ones_like(y)*np.mean(y)
y = y - mean_removed

# Compute the fft.
yf = scipy.fftpack.fft(y,n=N_fft)
xf = np.arange(0,Fs,Fs/N_fft)

##### Plot the fft #####
ax = plt.subplot(121)
pt, = ax.plot(xf,np.abs(yf), lw=2.0, c='b')
p = plt.Rectangle((Fs/2, 0), Fs/2, ax.get_ylim()[1], facecolor="grey", fill=True, alpha=0.75, hatch="/", zorder=3)
ax.add_patch(p)
ax.set_xlim((ax.get_xlim()[0],Fs))
ax.set_title('FFT', fontsize= 16, fontweight="bold")
ax.set_ylabel('FFT magnitude (power)')
ax.set_xlabel('Frequency (Hz)')
plt.legend((p,), ('mirrowed',))
ax.grid()

##### Close up on the graph of fft#######
# This is the same histogram above, but truncated at the max frequence + an offset. 
offset = 1    # just to help the visualization. Nothing important.
ax2 = fig.add_subplot(122)
ax2.plot(xf,np.abs(yf), lw=2.0, c='b')
ax2.set_xticks(xf)
ax2.set_xlim(-1,int(Fs/6)+offset)
ax2.set_title('FFT close-up', fontsize= 16, fontweight="bold")
ax2.set_ylabel('FFT magnitude (power) - log')
ax2.set_xlabel('Frequency (Hz)')
ax2.hold(True)
ax2.grid()

plt.yscale('log')

the output plots:输出图: 在此处输入图片说明

I've built a function that deals with plotting FFT of real signals.我已经构建了一个函数来处理绘制真实信号的 FFT。 The extra bonus in my function relative to the previous answers is that you get the actual amplitude of the signal.相对于以前的答案,我的函数中的额外奖励是您获得了信号的实际幅度。

Also, because of the assumption of a real signal, the FFT is symmetric, so we can plot only the positive side of the x-axis:此外,由于假设是真实信号,FFT 是对称的,因此我们只能绘制 x 轴的正侧:

import matplotlib.pyplot as plt
import numpy as np
import warnings


def fftPlot(sig, dt=None, plot=True):
    # Here it's assumes analytic signal (real signal...) - so only half of the axis is required

    if dt is None:
        dt = 1
        t = np.arange(0, sig.shape[-1])
        xLabel = 'samples'
    else:
        t = np.arange(0, sig.shape[-1]) * dt
        xLabel = 'freq [Hz]'

    if sig.shape[0] % 2 != 0:
        warnings.warn("signal preferred to be even in size, autoFixing it...")
        t = t[0:-1]
        sig = sig[0:-1]

    sigFFT = np.fft.fft(sig) / t.shape[0]  # Divided by size t for coherent magnitude

    freq = np.fft.fftfreq(t.shape[0], d=dt)

    # Plot analytic signal - right half of frequence axis needed only...
    firstNegInd = np.argmax(freq < 0)
    freqAxisPos = freq[0:firstNegInd]
    sigFFTPos = 2 * sigFFT[0:firstNegInd]  # *2 because of magnitude of analytic signal

    if plot:
        plt.figure()
        plt.plot(freqAxisPos, np.abs(sigFFTPos))
        plt.xlabel(xLabel)
        plt.ylabel('mag')
        plt.title('Analytic FFT plot')
        plt.show()

    return sigFFTPos, freqAxisPos


if __name__ == "__main__":
    dt = 1 / 1000

    # Build a signal within Nyquist - the result will be the positive FFT with actual magnitude
    f0 = 200  # [Hz]
    t = np.arange(0, 1 + dt, dt)
    sig = 1 * np.sin(2 * np.pi * f0 * t) + \
        10 * np.sin(2 * np.pi * f0 / 2 * t) + \
        3 * np.sin(2 * np.pi * f0 / 4 * t) +\
        7.5 * np.sin(2 * np.pi * f0 / 5 * t)

    # Result in frequencies
    fftPlot(sig, dt=dt)
    # Result in samples (if the frequencies axis is unknown)
    fftPlot(sig)

解析 FFT 图结果

There are already great solutions on this page, but all have assumed the dataset is uniformly/evenly sampled/distributed.这个页面上已经有很好的解决方案,但都假设数据集是均匀/均匀采样/分布的。 I will try to provide a more general example of randomly sampled data.我将尝试提供一个更一般的随机采样数据示例。 I will also use this MATLAB tutorial as an example:我还将使用此 MATLAB 教程作为示例:

Adding the required modules:添加所需的模块:

import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack
import scipy.signal

Generating sample data:生成样本数据:

N = 600 # Number of samples
t = np.random.uniform(0.0, 1.0, N) # Assuming the time start is 0.0 and time end is 1.0
S = 1.0 * np.sin(50.0 * 2 * np.pi * t) + 0.5 * np.sin(80.0 * 2 * np.pi * t)
X = S + 0.01 * np.random.randn(N) # Adding noise

Sorting the data set:对数据集进行排序:

order = np.argsort(t)
ts = np.array(t)[order]
Xs = np.array(X)[order]

Resampling:重采样:

T = (t.max() - t.min()) / N # Average period
Fs = 1 / T # Average sample rate frequency
f = Fs * np.arange(0, N // 2 + 1) / N; # Resampled frequency vector
X_new, t_new = scipy.signal.resample(Xs, N, ts)

Plotting the data and resampled data:绘制数据和重采样数据:

plt.xlim(0, 0.1)
plt.plot(t_new, X_new, label="resampled")
plt.plot(ts, Xs, label="org")
plt.legend()
plt.ylabel("X")
plt.xlabel("t")

在此处输入图片说明

Now calculating the FFT:现在计算FFT:

Y = scipy.fftpack.fft(X_new)
P2 = np.abs(Y / N)
P1 = P2[0 : N // 2 + 1]
P1[1 : -2] = 2 * P1[1 : -2]

plt.ylabel("Y")
plt.xlabel("f")
plt.plot(f, P1)

在此处输入图片说明

PS I finally got time to implement a more canonical algorithm to get a Fourier transform of unevenly distributed data. PS我终于有时间实现一个更规范的算法来获得不均匀分布数据的傅立叶变换。 You may see the code, description, and example Jupyter notebook here .您可以在此处查看代码、描述和示例 Jupyter 笔记本。

I write this additional answer to explain the origins of the diffusion of the spikes when using FFT and especially discuss the scipy.fftpack tutorial with which I disagree at some point.我写了这个额外的答案来解释使用 FFT 时尖峰扩散的起源,特别是讨论我在某些时候不同意的scipy.fftpack教程。

In this example, the recording time tmax=N*T=0.75 .在这个例子中,记录时间tmax=N*T=0.75 The signal is sin(50*2*pi*x) + 0.5*sin(80*2*pi*x) .信号是sin(50*2*pi*x) + 0.5*sin(80*2*pi*x) The frequency signal should contain two spikes at frequencies 50 and 80 with amplitudes 1 and 0.5 .频率信号应包含频率为5080且幅度为10.5两个尖峰。 However, if the analysed signal does not have a integer number of periods diffusion can appear due to the truncation of the signal:但是,如果分析的信号没有整数个周期,则由于信号的截断可能会出现扩散:

  • Pike 1: 50*tmax=37.5 => frequency 50 is not a multiple of 1/tmax => Presence of diffusion due to signal truncation at this frequency.派克 1: 50*tmax=37.5 => 频率50不是1/tmax的倍数 => 由于该频率的信号截断而存在扩散
  • Pike 2: 80*tmax=60 => frequency 80 is a multiple of 1/tmax => No diffusion due to signal truncation at this frequency.派克 2: 80*tmax=60 => 频率801/tmax的倍数 => 由于在该频率下信号截断,没有扩散

Here is a code that analyses the same signal as in the tutorial ( sin(50*2*pi*x) + 0.5*sin(80*2*pi*x) ), but with the slight differences:这是分析与教程中相同的信号的代码( sin(50*2*pi*x) + 0.5*sin(80*2*pi*x) ),但略有不同:

  1. The original scipy.fftpack example.原始 scipy.fftpack 示例。
  2. The original scipy.fftpack example with an integer number of signal periods ( tmax=1.0 instead of 0.75 to avoid truncation diffusion).原始 scipy.fftpack 示例具有整数个信号周期( tmax=1.0而不是0.75以避免截断扩散)。
  3. The original scipy.fftpack example with an integer number of signal periods and where the dates and frequencies are taken from the FFT theory.原始 scipy.fftpack 示例具有整数个信号周期,其中日期和频率取自 FFT 理论。

The code:编码:

import numpy as np
import matplotlib.pyplot as plt
import scipy.fftpack

# 1. Linspace
N = 600
# Sample spacing
tmax = 3/4
T = tmax / N # =1.0 / 800.0
x1 = np.linspace(0.0, N*T, N)
y1 = np.sin(50.0 * 2.0*np.pi*x1) + 0.5*np.sin(80.0 * 2.0*np.pi*x1)
yf1 = scipy.fftpack.fft(y1)
xf1 = np.linspace(0.0, 1.0/(2.0*T), N//2)

# 2. Integer number of periods
tmax = 1
T = tmax / N # Sample spacing
x2 = np.linspace(0.0, N*T, N)
y2 = np.sin(50.0 * 2.0*np.pi*x2) + 0.5*np.sin(80.0 * 2.0*np.pi*x2)
yf2 = scipy.fftpack.fft(y2)
xf2 = np.linspace(0.0, 1.0/(2.0*T), N//2)

# 3. Correct positioning of dates relatively to FFT theory ('arange' instead of 'linspace')
tmax = 1
T = tmax / N # Sample spacing
x3 = T * np.arange(N)
y3 = np.sin(50.0 * 2.0*np.pi*x3) + 0.5*np.sin(80.0 * 2.0*np.pi*x3)
yf3 = scipy.fftpack.fft(y3)
xf3 = 1/(N*T) * np.arange(N)[:N//2]

fig, ax = plt.subplots()
# Plotting only the left part of the spectrum to not show aliasing
ax.plot(xf1, 2.0/N * np.abs(yf1[:N//2]), label='fftpack tutorial')
ax.plot(xf2, 2.0/N * np.abs(yf2[:N//2]), label='Integer number of periods')
ax.plot(xf3, 2.0/N * np.abs(yf3[:N//2]), label='Correct positioning of dates')
plt.legend()
plt.grid()
plt.show()

Output:输出:

As it can be here, even with using an integer number of periods some diffusion still remains.就像这里一样,即使使用整数个周期,一些扩散仍然存在。 This behaviour is due to a bad positioning of dates and frequencies in the scipy.fftpack tutorial.这种行为是由于 scipy.fftpack 教程中日期和频率的错误定位造成的。 Hence, in the theory of discrete Fourier transforms:因此,在离散傅立叶变换理论中:

  • the signal should be evaluated at dates t=0,T,...,(N-1)*T where T is the sampling period and the total duration of the signal is tmax=N*T .应在日期t=0,T,...,(N-1)*T评估信号t=0,T,...,(N-1)*T其中 T 是采样周期,信号的总持续时间为tmax=N*T Note that we stop at tmax-T .请注意,我们在tmax-T处停止。
  • the associated frequencies are f=0,df,...,(N-1)*df where df=1/tmax=1/(N*T) is the sampling frequency.相关频率为f=0,df,...,(N-1)*df其中df=1/tmax=1/(N*T)是采样频率。 All harmonics of the signal should be multiple of the sampling frequency to avoid diffusion.信号的所有谐波应该是采样频率的倍数以避免扩散。

In the example above, you can see that the use of arange instead of linspace enables to avoid additional diffusion in the frequency spectrum.在上面的示例中,您可以看到使用arange而不是linspace可以避免频谱中的额外扩散。 Moreover, using the linspace version also leads to an offset of the spikes that are located at slightly higher frequencies than what they should be as it can be seen in the first picture where the spikes are a little bit at the right of the frequencies 50 and 80 .此外,使用linspace版本还会导致位于比应有的频率稍高的频率处的尖峰偏移,如第一张图片所示,尖峰位于频率5080 .

I'll just conclude that the example of usage should be replace by the following code (which is less misleading in my opinion):我只是得出结论,使用示例应该替换为以下代码(在我看来,这不那么具有误导性):

import numpy as np
from scipy.fftpack import fft

# Number of sample points
N = 600
T = 1.0 / 800.0
x = T*np.arange(N)
y = np.sin(50.0 * 2.0*np.pi*x) + 0.5*np.sin(80.0 * 2.0*np.pi*x)
yf = fft(y)
xf = 1/(N*T)*np.arange(N//2)
import matplotlib.pyplot as plt
plt.plot(xf, 2.0/N * np.abs(yf[0:N//2]))
plt.grid()
plt.show()

Output (the second spike is not diffused anymore):输出(第二个尖峰不再扩散):

I think this answer still bring some additional explanations on how to apply correctly discrete Fourier transform.我认为这个答案仍然带来了一些关于如何正确应用离散傅立叶变换的额外解释。 Obviously, my answer is too long and there is always additional things to say ( ewerlopes talked briefly about aliasing for instance and a lot can be said aboutwindowing ), so I'll stop.显然,我的回答太长了,而且总是有其他的事情要说(例如, ewerlopes 简要地谈到混叠,关于窗口化可以说很多),所以我会停下来。

I think that it is very important to understand deeply the principles of discrete Fourier transform when applying it because we all know so much people adding factors here and there when applying it in order to obtain what they want.我认为在应用离散傅里叶变换时深入理解它的原理非常重要,因为我们都知道很多人在应用它时会在这里和那里添加因子以获得他们想要的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM