简体   繁体   English

如何从FFT中提取特征?

[英]How to extract features from FFT?

I am gathering data from X, Y and Z accelerometer sensors sampled at 200 Hz.我正在从以 200 Hz 采样的 X、Y 和 Z 加速度计传感器收集数据。 The 3 axis are combined into a single signal called 'XYZ_Acc'. 3 个轴组合成一个称为“XYZ_Acc”的信号。 I followed tutorials on how to transform time domain signal into frequency domain using scipy fftpack library.我遵循了有关如何使用 scipy fftpack 库将时域信号转换为频域的教程。

The code I'm using is the below:我正在使用的代码如下:

from scipy.fftpack import fft

# get a 500ms slice from dataframe
sample500ms = df.loc[pd.to_datetime('2019-12-15 11:01:31.000'):pd.to_datetime('2019-12-15 11:01:31.495')]['XYZ_Acc']

f_s = 200              # sensor sampling frequency 200 Hz
T   = 0.005            # 5 milliseconds between successive observation T =1/f_s
N   = 100              # 100 samples in 0.5 seconds

f_values = np.linspace(0.0, f_s/2, N//2)
fft_values = fft(sample500ms)
fft_mag_values = 2.0/N * np.abs(fft_values[0:N//2])

Then I plot the frequency vs the magnitude然后我绘制频率与幅度

fig_fft = plt.figure(figsize=(5,5))
ax = fig_fft.add_axes([0,0,1,1])
ax.plot(f_values,fft_mag_values)

Screenshot:截屏:

截屏

My difficulty now is how to extract features out of this data, such as Irregularity, Fundamental Frequency, Flux...我现在的困难是如何从这些数据中提取特征,例如不规则性、基频、通量……

Can someone guide me into the right direction?有人可以引导我走向正确的方向吗?

Update 06/01/2019 - adding more context to my question. 2019 年 6 月 1 日更新 - 为我的问题添加更多上下文。

I'm relatively new in machine learning, so any feedback is appreciated.我在机器学习方面相对较新,因此感谢任何反馈。 X, Y, Z are linear acceleration signals, sampled at 200 Hz from a smart phone. X、Y、Z 是线性加速度信号,从智能手机以 200 Hz 采样。 I'm trying to detect road anomalies by analysing spectral and temporal statistics.我试图通过分析光谱和时间统计来检测道路异常。

Here's a sample of the csv file which is being parsed into a pandas dataframe with the timestamp as the index.这是一个 csv 文件的示例,它被解析为一个以时间戳为索引的 Pandas 数据帧。

X,Y,Z,Latitude,Longitude,Speed,timestamp
0.8756,-1.3741,3.4166,35.894833,14.354166,11.38,2019-12-15 11:01:30:750
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:755
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:760
1.0317,-0.2728,1.5602,35.894833,14.354166,11.38,2019-12-15 11:01:30:765
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:770
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:775
-0.1669,-1.9912,-4.2043,35.894833,14.354166,11.38,2019-12-15 11:01:30:780

In answer to 'francis', two columns are then added via this code:为了回答“francis”,然后通过以下代码添加了两列:

df['XYZ_Acc_Mag'] = (abs(df['X']) + abs(df['Y']) + abs(df['Z']))
df['XYZ_Acc'] = (df['X'] + df['Y'] + df['Z'])

'XYZ_Acc_Mag' is to be used to extract temporal statistics. 'XYZ_Acc_Mag' 用于提取时间统计。

'XYZ_Acc' is to be used to extract spectral statistics. 'XYZ_Acc' 用于提取光谱统计数据。

线图

Data 'XYZ_Acc_Mag' is then re sampled in 0.5 second frequency and temporal stats such as mean, standard-deviation, etc have been extracted in a new dataframe.然后以 0.5 秒的频率重新采样数据“XYZ_Acc_Mag”,并在新数据帧中提取时间统计数据,例如均值、标准差等。 Pair plots reveal the anomaly shown at time 11:01:35 in the line plot above.配对图显示了上面线图中时间 11:01:35 显示的异常。

配对图

Now back to my original question.现在回到我原来的问题。 I'm re sampling data 'XYZ_Acc', also at 0.5 seconds, and obtaining the magnitude array 'fft_mag_values'.我正在重新采样数据“XYZ_Acc”,也是 0.5 秒,并获得幅度数组“fft_mag_values”。 The question is how do I extract temporal features such as Irregularity, Fundamental Frequency, Flux out of it?问题是如何从中提取时间特征,例如不规则性、基频、通量?

Since 'XYZ_Acc' is defined as a linear combination of the components of the signal, taking its DFT makes sense.由于“XYZ_Acc”被定义为信号分量的线性组合,因此采用其 DFT 是有意义的。 It is equivalent to using a 1D accelometer in direction (1,1,1).它相当于在方向 (1,1,1) 上使用一维加速度计。 But a more physical energy-related viewpoint can be adopted.但是可以采用更多与物理能量相关的观点。 Computing the DFT is similar to writing the signal as a sum of sines.计算 DFT 类似于将信号写入正弦之和。 If the acceleration vector writes :如果加速度向量写成:

The corresponding velocity vector could write:对应的速度向量可以写成:

https://latex.codecogs.com/gif.latex?%5Cvec%7Bv%7D%3D-%5Cfrac%7B1%7D%7Bw%7D%5Cvec%7Ba%7D_0%5C%3B%20%5Ccos%28wt% 29

and the specific kinetic energy writes:和比动能写道:

This method requires computing the DFT a each component before the magnitude corresponding to each frequency.这种方法需要在每个频率对应的幅度之前计算每个分量的 DFT。

Another issue is that the DFT is intended to compute the Discrete Fourrier Transform of a periodic signal, that signal being build by periodizing the frame.另一个问题是 DFT 旨在计算周期信号的离散傅立叶变换,该信号是通过对帧进行周期化来构建的。 Nevertheless, the actual frame is never a period of a periodic signal and repeating the period creates artificial discontinuities at the end/begin of the frame.然而,实际帧从来都不是一个周期信号周期,重复该周期会在帧的结束/开始处产生人为的不连续性。 The effects strong discontinuities in the spectral domain, deemded spectral leakage , could be reduced by windowing the frame.频谱域中强不连续性的影响,即所谓的频谱泄漏,可以通过对帧加窗来减少。 Computing the real-to-complex DFT result in a power distribution, featuring peaks at particular frequencies.计算实数到复数 DFT 会产生功率分布,在特定频率处具有峰值。

In addition the frequency of a given peak is better estimated as the mean frequency with respect to power density, as shown in Why are frequency values rounded in signal using FFT?此外,将给定峰值的频率更好地估计为相对于功率密度的平均频率,如为什么使用 FFT 在信号中舍入频率值?

Another tool to estimate fundamental frequencies is to compute the autocorrelation of the signal: it is higher near the periods of the signal.估计基频的另一个工具是计算信号的自相关:它在信号周期附近更高。 Since the signal is a vector of 3 components, an autocorelation matrix can be built.由于信号是 3 个分量的向量,因此可以构建自相关矩阵。 It is a 3x3 Hermitian matrix for each time and therefore features real eigenvalues.它每次都是一个 3x3 厄米矩阵,因此具有实特征值。 The maxima of the higher eigen value can be picture as the magnitude of vaibrations while the correponding eigenvector is a complex direction, somewhat similar to the direction of vibrations combined to angular offsets.较高特征值的最大值可以表示为振动的幅度,而相应的特征向量是一个复杂的方向,有点类似于结合角偏移的振动方向。 The angular offset may signal an ellipsoidal vibration.角度偏移可以表示椭圆体振动。

Here is a fake signal, build by adding a guassian noise and sine waves:这是一个假信号,通过添加高斯噪声和正弦波构建: 高斯噪声和正弦波

Here is the power density spectrum for a given frame overlapping on sine wave:这是重叠在正弦波上的给定帧的功率密度谱: 加速度计的功率谱密度

Here is the resulting eigenvalues of the autocorrelation of the same frame, where the period of the 50Hz sine wave is visible.这是同一帧自相关的结果特征值,其中 50Hz 正弦波的周期是可见的。 Vertical scaling is wrong:垂直缩放是错误的: 加速度计自相关的特征值

Here goes a sample code:这是一个示例代码:

import matplotlib.pyplot as plt
import numpy as np
import scipy.signal

n=2000
t=np.linspace(0.,n/200,num=n,endpoint=False)

# an artificial signal, just for tests
ax=0.3*np.random.normal(0,1.,n) 
ay=0.3*np.random.normal(0,1.,n)
az=0.3*np.random.normal(0,1.,n)

ay[633:733]=ay[633:733]+np.sin(2*np.pi*30*t[633:733])
az[433:533]=az[433:533]+np.sin(2*np.pi*50*t[433:533])

#ax=np.sin(2*np.pi*10*t)
#ay=np.sin(2*np.pi*30*t)
#az=np.sin(2*np.pi*50*t)

plt.plot(t,ax, label='x')
plt.plot(t,ay, label='y')
plt.plot(t,az, label='z')

plt.xlabel('t, s')
plt.ylabel('acc, m.s^-2')
plt.legend()
plt.show()

#splitting the sgnal into frames of 0.5s
noiseheight=0.
for i in range(2*(n/200)):
    print 'frame', i,' time ', i*0.5, ' s'
    framea=np.zeros((100,3))
    framea[:,0]=ax[i*100:i*100+100]
    framea[:,1]=ay[i*100:i*100+100]
    framea[:,2]=az[i*100:i*100+100]

    #for that frame, apply window. Factor 2 so that average remains 1.
    window = np.hanning(100)
    framea[:,0]=framea[:,0]*window*2
    framea[:,1]=framea[:,1]*window*2
    framea[:,2]=framea[:,2]*window*2

    #DFT transform.
    hatacc=np.fft.rfft(framea,axis=0, norm=None)
    # scaling by length of frame.
    hatacc=hatacc/100.
    #computing the magnitude : all non-zero frequency are doubled to merge energy in bin N-k  exp(-2ik/n) to bin k
    accmag=2*(np.abs(hatacc[:,0])*np.abs(hatacc[:,0])+np.abs(hatacc[:,1])*np.abs(hatacc[:,1])+np.abs(hatacc[:,2])*np.abs(hatacc[:,2]))
    accmag[0]=accmag[0]*0.5

    #first frame says something about noise
    if i==0:
         noiseheight=2.*np.max(accmag)
    if np.max(accmag)>noiseheight:
       peaks, peaksdat=scipy.signal.find_peaks(accmag, height=noiseheight)

       timestep=0.005
       freq= np.fft.fftfreq(100, d=timestep)
       #see https://stackoverflow.com/questions/54714169/why-are-frequency-values-rounded-in-signal-using-fft/54775867#54775867
       # frequencies of peaks are better estimated as mean frequency of peak, with respect to power density
       for ind in peaks:
           totalweight=accmag[ind-2]+accmag[ind-1]+accmag[ind]+accmag[ind+1]+accmag[ind+2]
           totalweightedfreq=accmag[ind-2]*freq[ind-2]+accmag[ind-1]*freq[ind-1]+accmag[ind]*freq[ind]+accmag[ind+1]*freq[ind+1]+accmag[ind+2]*freq[ind+2]
           print 'found peak at frequency' , totalweightedfreq/totalweight, ' of height', accmag[ind]

       #ploting

       plt.plot(freq[0:50],accmag[0:50], label='||acc||^2')

       plt.xlabel('frequency, Hz')
       plt.ylabel('||acc||^2, m^2.s^-4')
       plt.legend()
       plt.show()


       #another approach to find fundamental frequencies: computing the autocorrelation of the windowed signal and searching for maximums.
       #building the autocorellation matrix
       autocorr=np.zeros((100,3,3), dtype=complex)
       acxfft=np.fft.fft(framea[:,0],axis=0, norm=None)
       acyfft=np.fft.fft(framea[:,1],axis=0, norm=None)
       aczfft=np.fft.fft(framea[:,2],axis=0, norm=None)
       acxfft[0]=0.
       acyfft[0]=0.
       aczfft[0]=0.

       autocorr[:,0,0]=np.fft.ifft(acxfft*np.conj(acxfft),axis=0, norm=None)
       autocorr[:,0,1]=np.fft.ifft(acxfft*np.conj(acyfft),axis=0, norm=None)
       autocorr[:,0,2]=np.fft.ifft(acxfft*np.conj(aczfft),axis=0, norm=None)
       autocorr[:,1,0]=np.fft.ifft(acyfft*np.conj(acxfft),axis=0, norm=None)
       autocorr[:,1,1]=np.fft.ifft(acyfft*np.conj(acyfft),axis=0, norm=None)
       autocorr[:,1,2]=np.fft.ifft(acyfft*np.conj(aczfft),axis=0, norm=None)
       autocorr[:,2,0]=np.fft.ifft(aczfft*np.conj(acxfft),axis=0, norm=None)
       autocorr[:,2,1]=np.fft.ifft(aczfft*np.conj(acyfft),axis=0, norm=None)
       autocorr[:,2,2]=np.fft.ifft(aczfft*np.conj(aczfft),axis=0, norm=None)
       # at a given time, the 3x3 matrix autocorr is Hermitian. 
       #Its eigenvalues are real, its unitary eigenvectors signals directions of vibrations and phase between components.
       autocorreigval=np.zeros((100,3))
       autocorreigvec=np.zeros((100,3,3), dtype=complex)
       for j in range(100):
           autocorreigval[j,:], autocorreigvec[j,:,:]=np.linalg.eigh(autocorr[j,:,:],UPLO='L')


       peaks, peaksdat=scipy.signal.find_peaks(autocorreigval[:50,2], 0.3*autocorreigval[0,2])
       cleared=np.zeros(len(peaks))
       peakperiod=np.zeros(len(peaks))
       for j in range(len(peaks)):
           totalweight=autocorreigval[peaks[j]-1,2]+autocorreigval[peaks[j],2]+autocorreigval[peaks[j]+1,2]
           totalweightedperiod=0.005*(autocorreigval[peaks[j]-1,2]*(peaks[j]-1)+autocorreigval[peaks[j],2]*(peaks[j])+autocorreigval[peaks[j]+1,2]*(peaks[j]+1))
           peakperiod[j]=totalweightedperiod/totalweight
       #cleared[0]=1.
       fundfreq=1
       for j in range(len(peaks)):
            if cleared[j]==0:
                 print "found fundamental frequency :", 1.0/(peakperiod[j]), 'eigenvalue', autocorreigval[peaks[j],2],' dir vibration ', autocorreigvec[peaks[j],:,2]
                 for k in range(j,len(peaks),1):
                     mm=np.zeros(1)
                     np.floor_divide(peakperiod[k],peakperiod[j],out=mm)
                     if ( np.abs(peakperiod[k]-peakperiod[j]*mm[0])< 0.2*peakperiod[j] or np.abs(peakperiod[k]-(peakperiod[j])*(mm[0]+1))< 0.2*peakperiod[j])  :
                          cleared[k]=fundfreq
                     #else :
                     #    print k,j,mm[0]
                     #    print peakperiod[k], peakperiod[j]*mm[0], peakperiod[j]*(mm[0]+1)  , peakperiod[j] 
                 fundfreq=fundfreq+1 

       plt.plot(t[i*100:i*100+100],autocorreigval[:,2], label='autocorrelation, large eigenvalue')
       plt.plot(t[i*100:i*100+100],autocorreigval[:,1], label='autocorrelation, medium eigenvalue')
       plt.plot(t[i*100:i*100+100],autocorreigval[:,0], label='autocorrelation, small eigenvalue')

       plt.xlabel('t, s')
       plt.ylabel('acc^2, m^2.s^-4')
       plt.legend()
       plt.show()

The output is:输出是:

frame 0  time  0.0  s
frame 1  time  0.5  s
frame 2  time  1.0  s
frame 3  time  1.5  s
frame 4  time  2.0  s
found peak at frequency 50.11249238149811  of height 0.2437842149351196
found fundamental frequency : 50.31467771196368 eigenvalue 47.03344783764712  dir vibration  [-0.11441502+0.00000000e+00j  0.0216911 +2.98101624e-18j
 -0.9931962 -5.95276353e-17j]
frame 5  time  2.5  s
frame 6  time  3.0  s
found peak at frequency 30.027895460975156  of height 0.3252387031089667
found fundamental frequency : 29.60690406120401 eigenvalue 61.51059682797539  dir vibration  [ 0.11384195+0.00000000e+00j -0.98335779-4.34688198e-17j
 -0.14158908+3.87566125e-18j]
frame 7  time  3.5  s
found peak at frequency 26.39622018109896  of height 0.042081187689137545
found fundamental frequency : 67.65844834016518 eigenvalue 6.875616417422696  dir vibration  [0.8102307 +0.00000000e+00j 0.32697001-8.83058693e-18j
 0.48643275-4.76094302e-17j]
frame 8  time  4.0  s
frame 9  time  4.5  s

Frequencies 50Hz and 30Hz got caught as 50.11/50.31Hz and 30.02/29.60Hz and directions are quite accurate as well.频率 50Hz 和 30Hz 被捕获为 50.11/50.31Hz 和 30.02/29.60Hz,方向也非常准确。 The last feature at 26.39Hz/67.65Hz is likely garbage, as it features different frequencies for the two methods and lower magnitude/eigenvalue. 26.39Hz/67.65Hz 的最后一个特征可能是垃圾,因为它具有两种方法的不同频率和较低的幅度/特征值。

Regarding monitoring of road surface to improve maintenance, I know of a project at my compagny, called Aigle3D .关于监测路面以改善维护,我知道我公司有一个项目,称为Aigle3D A laser fitted at the back of a van scans the road at highway speed at milimetric accuracy.安装在货车后部的激光以毫米级精度以高速公路速度扫描道路。 The van is also fitted with a server, cameras and other sensors, thus providing a huge amount of data on road geometry and defects, presently covering hundreds of km of the french national road network.面包车还配备了服务器、摄像头和其他传感器,从而提供了大量有关道路几何形状和缺陷的数据,目前覆盖了数百公里的法国国家公路网。 Detecting and repairing small early defects and cracks may extend the life expectancy of the road at limited cost.检测和修复早期的小缺陷和裂缝可以以有限的成本延长道路的预期寿命。 If useful, data from accelerometers of daily users could indeed complete the monitoring system, allowing a faster reaction whenether a large pothole appears.如果有用,来自日常用户的加速度计的数据确实可以完善监控系统,从而在出现大坑洞时做出更快的反应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM