smooth signal and find peaks

Question

Given I have an X and Y array such that:

X = np.array([1,2,3,4,5,6,7,8,9,10,11,12])

and

Y = np.array([-19.9, -19.6, -17.6, -15.9, -19.9, -18.4, -17.7, -16.6, -19.5, -20.4, -17.6, -15.9])

I get a plot like:

Here there are 3 very clear peaks that I can see. I can fit this data using:

# fit polynomial
z = np.polyfit(X1, Y, 8)
f = np.poly1d(z)

# calculate new x's and y's
x_new = np.linspace(X[0], X[-1], 100)
y_new = f(x_new)

and I can get the following which shows the change in signal over the course of a year - in this case in rice agriculture and the number of agricultural cycles (3 peaks) :

Here I use scipy.signal.argrelextrema to find the peaks and troughs of the curve. However, to get a curve with a good fit is a very 'manual' approach and I have to interpret the data by eye first, in order to choose the polynomial order. I will be repeating this process on many datasets (100,000's) so won't be able to do this manually each time.

Furthermore, the number of peaks I have is likely to change. In fact my ultimate goal here is to categorize the datasets I have into the number of peaks I can detect. There are also cases where the signal has more noise.

I have looked into scipy.signal.find_peaks (and related algorithms) but this finds every peak and not just the major ones, particularly in noisier data. I have also looked into savgol filters and gaussian filters and am able to get a result but often have to specify the order of the polynomial etc, which is likely to change with the number of peaks.

Is there a way to smooth a signal to get an approximation of the number of peaks without having to manually specify polynomial orders etc? Is there an algorithm/method available that can detect general trends without too much user input?

I'm also open to alternative methods if there is a better method than curve fitting. I fear that the result I get out will only be as good as what I put in, and so any general curve fitting approaches will deliver poorer results.

Answer 1

Here is a graphical fitter using your data and a simple equation, a Fourier series 1 Term with offset, that appears to give an automatic smooth fit.

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit


xData = numpy.array([1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0])
yData = numpy.array([-19.9, -19.6, -17.6, -15.9, -19.9, -18.4, -17.7, -16.6, -19.5, -20.4, -17.6, -15.9])


# Fourier Series 1 Term (scaled X) from zunzun.com
def func(x, offset, a1, b1, c1):
    return a1 *numpy.sin(c1 * x) + b1 *numpy.cos(c1 * x) + offset


# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_xlabel('X Data') # X axis data label
    axes.set_ylabel('Y Data') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Answer 2

pip install findpeaks

from findpeaks import findpeaks

X = [-19.9, -19.6, -17.6, -15.9, -19.9, -18.4, -17.7, -16.6, -19.5, -20.4, -17.6, -15.9]

# Initialize
fp = findpeaks(lookahead=1)
# Make the fit
results1 = fp.fit(X)

results1['df']
# x y   labx    valley  peak    labx_topology   valley_topology peak_topology   persistence
#   0   0   -19.9   1.0 True    False   1.0 True    False   
#   1   1   -19.6   1.0 False   False   1.0 False   False   
#   2   2   -17.6   1.0 False   False   1.0 False   False   
#   3   3   -15.9   1.0 False   True    1.0 False   True    
#   4   4   -19.9   1.0 False   False   2.0 True    False   
#   5   5   -18.4   2.0 True    False   2.0 False   False   
#   6   6   -17.7   2.0 False   False   2.0 False   False   
#   7   7   -16.6   2.0 False   True    2.0 False   True    
#   8   8   -19.5   2.0 False   False   2.0 False   False   
#   9   9   -20.4   3.0 True    False   2.0 False   False   
#   10  10  -17.6   3.0 False   False   2.0 False   False   
#   11  11  -15.9   3.0 True    False   2.0 True    False   

# Make plot
fp.plot()

# Initialize
fp = findpeaks(lookahead=1, interpolate=10)
# Make the fit
results2 = fp.fit(X)
# Results
results1['df']
# Make plot
fp.plot()

smooth signal and find peaks

Question

2 answers

solution1
1 ACCPTED 2019-05-28 00:20:12

solution2
0 2020-06-18 17:40:15

smooth signal and find peaks

Question

2 answers

solution1 1 ACCPTED 2019-05-28 00:20:12

solution2 0 2020-06-18 17:40:15

solution1
1 ACCPTED 2019-05-28 00:20:12

solution2
0 2020-06-18 17:40:15