简体   繁体   中英

Smooth Data and Find Maximum

I have a dataset (see below) of two variables, x and y. I want to find for which value of x, does a maximum in y occur. My current approach is simply to look up the x which gives me the maximum y. This is not ideal as my data is quite noisy, so I would like to perform some sort of smoothing first, and then find the max.

So far, I have tried to use R to smooth my data with npreg (kernel regression) from the np package to obtain this curve:


but I'm not sure how to find the max.

I would like a solution to the following in Python:

1) Smooth the data (doesn't to be kernel regression)

2) Find the value of x where the max in y occurs using the smoothed data

x   y
-20 0.006561733
-19 -4.48E-08
-18 -4.48E-08
-17 -4.48E-08
-16 0.003281305
-15 0.00164063
-14 0.003280565
-13 0.003282537
-12 -4.48E-08
-11 0.003281286
-10 0.004921239
-9  0.00491897
-8  -1.52E-06
-7  0.004925867
-6  -1.27E-06
-5  0.009839438
-4  0.001643726
-3  -4.48E-08
-2  2.09E-06
-1  -0.001640027
0   0.006559627
1   0.001636958
2   2.36E-06
3   0.003281469
4   0.011481469
5   0.004922279
6   0.018044207
7   0.011483134
8   0.014765087
9   0.008201379
10  0.00492497
11  0.006560482
12  0.009844796
13  0.011483199
14  0.008202129
15  0.001641621
16  0.004921645
17  0.006563377
18  0.006561068
19  0.008201004

I'd run a Gaussian filter over the data to smooth:

# first, make a function to linearly interpolate the data
f = scipy.interpolate.interp1d(x,y)

# resample with 1000 samples
xx = np.linspace(-20,19, 1000)

# compute the function on this finer interval
yy = f(xx)

# make a gaussian window
window = scipy.signal.gaussian(200, 60)

# convolve the arrays
smoothed = scipy.signal.convolve(yy, window/window.sum(), mode='same')

# get the maximum

Here's the smoothed result:


The max occurs at 6.93.

There are a whole bunch of other window functions and filtering options in scipy.signal . See the documentation for more.

You might be able to use the smooth spline functions:

import numpy as np
from scipy import interpolate
x = range(-20,20)
y = [0.006561733, -4.48e-08, -4.48e-08, -4.48e-08, 0.003281305, 0.00164063, 0.003280565, 0.003282537, -4.48e-08, 0.003281286, 0.004921239, 0.00491897, -1.52e-06, 0.004925867, -1.27e-06, 0.009839438, 0.001643726, -4.48e-08, 2.09e-06, -0.001640027, 0.006559627, 0.001636958, 2.36e-06, 0.003281469, 0.011481469, 0.004922279, 0.018044207, 0.011483134, 0.014765087, 0.008201379, 0.00492497, 0.006560482, 0.009844796, 0.011483199, 0.008202129, 0.001641621, 0.004921645, 0.006563377, 0.006561068, 0.008201004]

tck = interpolate.splrep(x,y) # pass in s= some value to change smoothing: 
                              # higher = smoother, s=0 for no smoothing

xnew = np.arange(-20, 20, 0.1)
ynew = interpolate.splev(xnew,tck,der=0)

now xnew and ynew contain a finely sampled version of the fit, and you get the max with

max_index = np.argmax(ynew)
max_value = ynew[max_index]
max_x = xnew[max_index]

Sorry I was not able to test this; computer I am using right now doesn't have scipy etc. loaded... Should give you some ideas though.

Reference: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html

I am not completely sure what is the main problem to solve? Better smoothing, finding the minimum or doing it all in Python? Why are you changing to Python if you have promising progress in R? I have found that in R the built in supsmu function usually does very good non-parametric smoothing. This is how I would do this in R.

smooth <- do.call(supsmu, data)
min.idx <- which.min(smooth$y)
min.point <- c(smooth$x[min.idx], smooth$y[min.idx])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM