Calculating FFT Features from a discrete signal

Question

I have an array of uniform signal which were sampled at 10Hz (meaning two consecutive data points is 100 milliseconds apart). This actually the magnitude of the 3 axes of a 3d gyroscope, the array contains 30 data points (in 3 seconds). I plot the frequency of this series as follow

import numpy as np
import matplotlib.pyplot as pl

sample_rate = 10
x = np.array([318.45,302.78,316.47,334.14,333.41,326.15,320.07,318.68,314.12,308.64,300.15,304.33,318.42,322.72,329.56,339.18,338.03,343.27,351.44,353.23,352.35,352.88,353.43,352.14,351.28,352.82,353.36,353.35,353.19,353.82])
x = np.array(x) - np.mean(x)
p = np.abs(np.fft.rfft(x))
f = np.linspace(0, sample_rate/2, len(p))
pl.plot(f, p)
pl.show()

Can someone tell me did I plot right, or not? I am planning to calculate the follow features (from above signal)

DC Component
Spectral Energy
Information Entropy
Dominant frequency components
Principal frequency
Magnitude of the first five components of FFT analysis

Can someone help me to fill the above code for the calculating of those features?

----@RoadRunner66: Please see my questions below as I could not post a long reply to you----

Thank you for you answer and your code,

Regarding to your question, the data is from the Gyro scope which measures the Euler angles.

So (sum x[i]**2 : 3357757.0) is the Spectral Energy? If yes then do I need to normalize it by dividing this number by n? (or multiply with n as you did), however the two below papers have difference in their definitions.

As in the first paper (first link below) they stated that "The second frequency-domain feature set was chosen to be spectral energy, which is defined to be the sum of the squared FFT coefficients"

In the second paper (2nd link) they stated in another way that "Spectral energy: the squared sum of spectral coefficients divided by the number of samples in a window"

And what about the Principal frequency, is that the same meaning (term) with Dominant frequency? I guess Principal frequency refers to the only one which has the highest spectrum peak?

I printed the frequencies and the equivalent magnitudes of into two rows like this

I think you printed the magnitude of the first 5 like the yellow bellow. I am not sure about the definition of "First 5 components"

If we use the first consecutive five like you have pointed out, does it make sense to include the ones (like at frequency 0 or 0.666) and feed them into my prediction model (as explained below), because it is too low compared to the others. If the returning spectrum is clear with dominant frequencies like at 1hz and 3hz then maybe the magnitude at frequency of 0.5hz or 1.5hz will be close to zero.

Could it be the term "Magnitude of the first five components of FFT analysis" is the "Magnitude of the first five dominant components" as I highlighted in the blue colors? Does this term refer to 5 values or just 1 values (square root of the sum of the 5 squares) ? In case it refers to the 5 values (very likely it is) then I think the top five dominant components in magnitude will be a better choice when it comes to comparing the difference between two signal?

Btw, the second paper also wrote "First 5-FFT coefficients: the first 5 of the fast-Fourier transform coefficients are taken since they capture the main frequency components, and the use of additional coefficients did not improve the accuracies"

To be frankly, I am working with the problem of cow activities, my strategy is to segment the sensor data into time windows (3,5,7..seconds) and extract features from each window, then feed them to the Machine Learning model.

(My data include a 3d gyroscope and a 3d accelerometer attached to the neck of the cow, the sensors data sampling is 10Hz)

I want to combine two types of features, one is time domain features and the other one is frequency domain features.

I read paper and found those above set of frequency domain features which includes the term "Magnitude of the first five components of FFT analysis" (from this paper https://ieeexplore.ieee.org/abstract/document/4663615 ) and from this one https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634510/

The second one they referred it as "First 5-FFT coefficients: the first 5 of the fast-Fourier transform coefficients are taken since they capture the main frequency components, and the use of additional coefficients did not improve the accuracies."

Thank you so much for your reading and answer!

Answer 1

You have done most of the work. This appears to be for a class assignment, so that might explain the sparsity of the data, which are not really a lot to go by.

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

sample_rate = 10 #Hz

x = np.array([318.45,302.78,316.47,334.14,333.41,326.15,320.07,318.68,314.12,308.64,
              300.15,304.33,318.42,322.72,329.56,339.18,338.03,343.27,351.44,353.23,
              352.35,352.88,353.43,352.14,351.28,352.82,353.36,353.35,353.19,353.82])
n=len(x)
print(f'number of points n : {n}')
t= np.linspace(0,3, n)     # time according to your description
mn=np.mean(x)
print(f' mean = {mn:.3f} unit')
print(f' sum x[i]**2  : {np.sum(x**2) :.1f} unit^2 ')

pr = np.abs(np.fft.rfft(x))/n
print()
print(f' DC peak :  {pr[0]};  this makes choice of normalization 1/n meaningful')

print(f' n *sum X[k]**2   : {np.sum(np.abs(pr)**2)*n :.1f} unit^2 ') 
f = np.linspace(0, sample_rate/2, len(pr))   # 10 Sa/sec, so 5 Hz is Nyquist limit

plt.figure(figsize=(20,5))

plt.subplot(141)
plt.plot(t,x,'.-')
plt.title('original data') 

plt.subplot(142)
plt.plot(f, pr,'.-')
plt.title('spectrum')


plt.subplot(143)
xf = np.array(x) - mn   #  remove the DC
plt.plot(t, xf,'.-')
plt.title('original data with DC removed')

plt.subplot(144)
pr = np.abs(np.fft.rfft(xf))/n
plt.plot(f, pr, '.-');
plt.title('spectrum (DC removed) ')

The DC peak is the same as the average of the original signal at 334 units.

The spectral energy (see Parseval's Theorem ) is the same in time and frequency domain.

I don't know what definition for information entropy is expected for you to be used (see eg Approx. entropy ).

The dominant frequencies in the spectrum (after removing DC) are at 1/3 Hz, 1 Hz and 2 Hz, with 1/3 Hz being the largest. I printed the magnitude of the first five.

An important question for me would be the physical meaning of the data. Are they angles? If yes, in what unit?

Note that one can 'see' the frequency components in the original data at 1/3 Hz (one wave over 3 secs) , 1 Hz (one wave over 1 sec) and 2 Hz (one wave over 1/2 sec), respectively.

Calculating FFT Features from a discrete signal

Question

1 answers

solution1
0 2020-01-23 01:59:05

Calculating FFT Features from a discrete signal

Question

1 answers

solution1 0 2020-01-23 01:59:05

solution1
0 2020-01-23 01:59:05