简体   繁体   English

Python Data Frame如何在2D数组中查找局部最大值

[英]Python Data Frame how to find the local maximum in a 2D array

I have a Data Frame of two columns namely x,y. 我有两列的数据框,即x,y。 I want to find the local maximums in x,y plot as shown in figure 1 of attached plot. 我想在x,y图中找到局部最大值,如所附图的图1所示。 I followed this way: converted each column of data frame into two separate matrix arrays. 我按照这种方式进行:将数据帧的每一列转换为两个单独的矩阵数组。 Step 1: My code first identifies index positions of local maximums in Y. Step 2: value of x corresponding to the those index positions will be identified. 步骤1:我的代码首先确定Y中局部最大值的索引位置。步骤2:将标识与那些索引位置相对应的x值。 That's it. 而已。 As a result, i could found two local maximums only. 结果,我只能找到两个局部最大值。 But, there are there three local maximums. 但是,存在三个局部最大值。 My method fail to identify it. 我的方法无法识别它。 My question: is there a way I can identify the local maximum directly from 2D array ? 我的问题:有没有办法可以直接从2D数组识别局部最大值?

My present code: 我现在的代码:

x = my_dataframe.iloc[:,0].values # conversion of Data frame column into an array
y = my_dataframe.iloc[:,2].values # conversion of Data frame column into an array        

# Step 1: for local maximum in y list
local_y_index = argrelextrema(y, np.greater)
print("Index position of local maximum in y = ",local_y_index[0])

# Step 2: Below code is for identifying the value of x at local maximum
local_x = x[local_mpp_index[0]]
print("value of x corresponding to local maximum in y = ",local_x)

The output is: 输出为:

Index position of local maximum in y =  [105 197]
value of x corresponding to local maximum in y =  [149.21 281.06]

My question: As shown in Figure 1, my above approach has identified two local peaks only. 我的问题:如图1所示,我的上述方法仅确定了两个局部峰。 But there are three peaks. 但是有三个高峰。 Is there a better approach to identify the local maximum directly from 2D array of x and y? 是否有更好的方法直接从x和y的2D数组中识别出局部最大值?

在此处输入图片说明

x = [1.0330e-01, 1.0380e-01, 1.0430e-01, 1.0680e-01, 1.1932e-01, 1.8192e-01,
 3.6365e-01, 5.4539e-01, 7.9191e-01, 1.0384e+00, 1.3626e+00, 1.6869e+00,
 1.7438e+00, 2.0286e+00, 2.4825e+00, 2.9363e+00, 3.4787e+00, 4.0212e+00,
 4.7129e+00, 5.2137e+00, 6.0460e+00, 6.9486e+00, 7.8511e+00, 8.6835e+00,
 1.0092e+01, 1.0418e+01, 1.2153e+01, 1.3888e+01, 1.5623e+01, 1.7358e+01,
 1.9093e+01, 2.0828e+01, 2.2563e+01, 2.4298e+01, 2.6033e+01, 2.7768e+01,
 2.9503e+01, 3.1237e+01, 3.2972e+01, 3.4707e+01, 3.6442e+01, 3.8177e+01,
 3.9912e+01, 4.1647e+01, 4.3382e+01, 4.5117e+01, 4.6852e+01, 4.8587e+01,
 5.0322e+01, 5.2056e+01, 5.3791e+01, 5.5526e+01, 5.7261e+01, 5.8996e+01,
 6.0731e+01, 6.2466e+01, 6.4201e+01, 6.5936e+01, 6.7671e+01, 6.9406e+01,
 7.1141e+01, 7.2875e+01, 7.4610e+01, 7.6345e+01, 7.8080e+01, 7.9815e+01,
 8.1550e+01, 8.3285e+01, 8.5020e+01, 8.6755e+01, 8.8490e+01, 9.0225e+01,
 9.1960e+01, 9.3694e+01, 9.5429e+01, 9.7164e+01, 9.8899e+01, 1.0063e+02,
 1.0237e+02, 1.0410e+02, 1.0584e+02, 1.0757e+02, 1.0931e+02, 1.1104e+02,
 1.1278e+02, 1.1451e+02, 1.1625e+02, 1.1798e+02, 1.1972e+02, 1.2145e+02,
 1.2319e+02, 1.2492e+02, 1.2666e+02, 1.2839e+02, 1.3013e+02, 1.3186e+02,
 1.3360e+02, 1.3533e+02, 1.3707e+02, 1.3880e+02, 1.4054e+02, 1.4227e+02,
 1.4401e+02, 1.4574e+02, 1.4748e+02, 1.4921e+02, 1.5095e+02, 1.5268e+02,
 1.5442e+02, 1.5615e+02, 1.5684e+02, 1.5753e+02, 1.5789e+02, 1.5861e+02,
 1.5934e+02, 1.5962e+02, 1.6056e+02, 1.6136e+02, 1.6256e+02, 1.6309e+02,
 1.6482e+02, 1.6656e+02, 1.6829e+02, 1.7003e+02, 1.7176e+02, 1.7350e+02,
 1.7523e+02, 1.7697e+02, 1.7870e+02, 1.8044e+02, 1.8217e+02, 1.8391e+02,
 1.8564e+02, 1.8738e+02, 1.8911e+02, 1.9085e+02, 1.9258e+02, 1.9432e+02,
 1.9605e+02, 1.9779e+02, 1.9952e+02, 2.0126e+02, 2.0299e+02, 2.0473e+02,
 2.0646e+02, 2.0820e+02, 2.0993e+02, 2.1167e+02, 2.1340e+02, 2.1514e+02,
 2.1687e+02, 2.1861e+02, 2.1927e+02, 2.1993e+02, 2.2034e+02, 2.2103e+02,
 2.2172e+02, 2.2208e+02, 2.2296e+02, 2.2381e+02, 2.2493e+02, 2.2555e+02,
 2.2700e+02, 2.2728e+02, 2.2871e+02, 2.2902e+02, 2.3057e+02, 2.3075e+02,
 2.3164e+02, 2.3249e+02, 2.3422e+02, 2.3596e+02, 2.3769e+02, 2.3943e+02,
 2.4116e+02, 2.4290e+02, 2.4463e+02, 2.4637e+02, 2.4810e+02, 2.4984e+02,
 2.5157e+02, 2.5331e+02, 2.5504e+02, 2.5678e+02, 2.5851e+02, 2.6025e+02,
 2.6198e+02, 2.6371e+02, 2.6545e+02, 2.6718e+02, 2.6892e+02, 2.7065e+02,
 2.7239e+02, 2.7412e+02, 2.7586e+02, 2.7759e+02, 2.7933e+02, 2.8106e+02,
 2.8280e+02, 2.8453e+02, 2.8627e+02, 2.8800e+02, 2.8974e+02, 2.9147e+02,
 2.9321e+02, 2.9494e+02, 2.9668e+02, 2.9841e+02, 3.0015e+02, 3.0188e+02,
 3.0362e+02, 3.0535e+02, 3.0709e+02, 3.0882e+02, 3.1056e+02, 3.1229e+02,
 3.1403e+02, 3.1576e+02, 3.1749e+02, 3.1923e+02, 3.2096e+02, 3.2270e+02,
 3.2443e+02, 3.2617e+02, 3.2790e+02, 3.2964e+02, 3.3137e+02, 3.3311e+02,
 3.3484e+02, 3.3658e+02, 3.4686e+02, 3.4686e+02, 3.4686e+02, 3.4686e+02,
 3.4686e+02, 3.4686e+02, 3.4686e+02, 3.4686e+02, 3.4687e+02]

y = [4.2014e-01, 4.2237e-01, 4.2460e-01, 4.3574e-01, 4.9146e-01, 7.7004e-01,
     1.5788e+00, 2.3874e+00, 3.4842e+00, 4.5808e+00, 6.0228e+00, 7.4647e+00,
     7.7180e+00, 8.9843e+00, 1.1002e+01, 1.3020e+01, 1.5431e+01, 1.7842e+01,
     2.0916e+01, 2.3141e+01, 2.6839e+01, 3.0848e+01, 3.4856e+01, 3.8552e+01,
     4.4807e+01, 4.6254e+01, 5.3953e+01, 6.1650e+01, 6.9344e+01, 7.7035e+01,
     8.4723e+01, 9.2409e+01, 1.0009e+02, 1.0777e+02, 1.1545e+02, 1.2312e+02,
     1.3079e+02, 1.3846e+02, 1.4613e+02, 1.5379e+02, 1.6145e+02, 1.6911e+02,
     1.7677e+02, 1.8442e+02, 1.9207e+02, 1.9971e+02, 2.0735e+02, 2.1499e+02,
     2.2263e+02, 2.3027e+02, 2.3790e+02, 2.4552e+02, 2.5315e+02, 2.6077e+02,
     2.6839e+02, 2.7600e+02, 2.8361e+02, 2.9122e+02, 2.9882e+02, 3.0642e+02,
     3.1401e+02, 3.2160e+02, 3.2918e+02, 3.3676e+02, 3.4433e+02, 3.5190e+02,
     3.5946e+02, 3.6701e+02, 3.7455e+02, 3.8209e+02, 3.8961e+02, 3.9712e+02,
     4.0462e+02, 4.1211e+02, 4.1958e+02, 4.2703e+02, 4.3447e+02, 4.4188e+02,
     4.4926e+02, 4.5661e+02, 4.6393e+02, 4.7122e+02, 4.7846e+02, 4.8565e+02,
     4.9278e+02, 4.9985e+02, 5.0685e+02, 5.1376e+02, 5.2057e+02, 5.2728e+02,
     5.3386e+02, 5.4029e+02, 5.4656e+02, 5.5265e+02, 5.5852e+02, 5.6415e+02,
     5.6950e+02, 5.7453e+02, 5.7920e+02, 5.8347e+02, 5.8727e+02, 5.9056e+02,
     5.9325e+02, 5.9527e+02, 5.9654e+02, 5.9697e+02, 5.9646e+02, 5.9490e+02,
     5.9217e+02, 5.9175e+02, 5.9419e+02, 5.9665e+02, 5.9790e+02, 6.0049e+02,
     6.0309e+02, 6.0410e+02, 6.0748e+02, 6.1034e+02, 6.1467e+02, 6.1658e+02,
     6.2282e+02, 6.2905e+02, 6.3528e+02, 6.4151e+02, 6.4772e+02, 6.5393e+02,
     6.6013e+02, 6.6632e+02, 6.7251e+02, 6.7868e+02, 6.8484e+02, 6.9099e+02,
     6.9712e+02, 7.0323e+02, 7.0931e+02, 7.1536e+02, 7.2137e+02, 7.2732e+02,
     7.3320e+02, 7.3899e+02, 7.4464e+02, 7.5013e+02, 7.5540e+02, 7.6039e+02,
     7.6502e+02, 7.6922e+02, 7.7287e+02, 7.7589e+02, 7.7817e+02, 7.7962e+02,
     7.8014e+02, 7.8039e+02, 7.8250e+02, 7.8464e+02, 7.8598e+02, 7.8823e+02,
     7.9050e+02, 7.9166e+02, 7.9458e+02, 7.9739e+02, 8.0109e+02, 8.0313e+02,
     8.0793e+02, 8.0888e+02, 8.1359e+02, 8.1462e+02, 8.1978e+02, 8.2036e+02,
     8.2330e+02, 8.2610e+02, 8.3183e+02, 8.3755e+02, 8.4326e+02, 8.4897e+02,
     8.5466e+02, 8.6035e+02, 8.6602e+02, 8.7168e+02, 8.7732e+02, 8.8295e+02,
     8.8855e+02, 8.9412e+02, 8.9965e+02, 9.0513e+02, 9.1055e+02, 9.1588e+02,
     9.2110e+02, 9.2618e+02, 9.3108e+02, 9.3576e+02, 9.4015e+02, 9.4420e+02,
     9.4784e+02, 9.5100e+02, 9.5362e+02, 9.5563e+02, 9.5698e+02, 9.5761e+02,
     9.5746e+02, 9.5650e+02, 9.5468e+02, 9.5195e+02, 9.4828e+02, 9.4363e+02,
     9.3796e+02, 9.3122e+02, 9.2337e+02, 9.1437e+02, 9.0418e+02, 8.9275e+02,
     8.8004e+02, 8.6600e+02, 8.5059e+02, 8.3376e+02, 8.1546e+02, 7.9566e+02,
     7.7430e+02, 7.5134e+02, 7.2674e+02, 7.0046e+02, 6.7244e+02, 6.4266e+02,
     6.1108e+02, 5.7765e+02, 5.4234e+02, 5.0512e+02, 4.6596e+02, 4.2483e+02,
     3.8170e+02, 3.3654e+02, 6.8800e-05, 5.1500e-05, 4.8000e-05, 4.7300e-05,
     4.7200e-05, 4.7200e-05, 4.7200e-05, 4.7200e-05, 1.5520e-04]

Any extremum is such that the derivative at the extremum is zero. 任何极值使得极值处的导数为零。 As we do not have an analytic expression for the data, the next best thing we can do is approximate the derivative. 由于我们没有数据的解析表达式,因此我们可以做的下一件最好的事情是近似导数。 This is essentially the same as taking the 1-step difference and looking for those values that are 'small'. 这本质上与采取1步差异并寻找“小”值相同。

The following works well for me, 以下对我来说很有效,

def find_extrema(frame, tolerance=0.5):
    diff = frame.diff()

    extrema = diff[np.abs(diff) < tolerance]

    return extrema[~np.isnan(extrema.y)]


df = pd.DataFrame(dict(y=y), index=x)

candidates = find_extrema(df)

print(candidates)

And I find, 我发现

                      y
0.10380    2.230000e-03
0.10430    2.230000e-03
0.10680    1.114000e-02
0.11932    5.572000e-02
0.18192    2.785800e-01
1.74380    2.533000e-01
149.21000  4.300000e-01
156.15000 -4.200000e-01
218.61000  2.500000e-01
282.80000 -1.500000e-01
346.86000 -1.730000e-05
346.86000 -3.500000e-06
346.86000 -7.000000e-07
346.86000 -1.000000e-07
346.86000  0.000000e+00
346.86000  0.000000e+00
346.86000  0.000000e+00
346.87000  1.080000e-04

This will require some cleaning still (mostly on the edges), but the general idea should hopefully be clear to you. 这将需要进行一些清洁(大部分在边缘),但是一般的想法应该对您很清楚。

The following plot was made with, 下图是用

tolerance = 0.75

diff = df.diff()

ax = diff[np.abs(diff) < tolerance].y.plot(
     title="Derivative approximation for tolerance = {0}".format(tolerance))

ax.set_xlabel("x")
ax.set_ylabel("y[x] - y[x - 1]")

plt.show()

(notice the larger tolerance, so we can actually observe some lines rather than just points) (请注意较大的公差,因此我们实际上可以观察到一些线而不仅仅是点)

极值

You can also use the np.gradient function and look where the gradient changes sign: 您还可以使用np.gradient函数并查看渐变更改符号的位置:

z = np.gradient(y, x)
i = 0
while i < len(x)-2:
if (z[i]*z[i+2]<=0 and z[i]>0): #gradient changes sign > optima, and point previous to optima has a positive slope
        print(i+1, x[i+1], y[i+1])
        i = i+1
    i+=1

plt.ylim(-1, 1)
plt.plot(x, z)

Looing at the plot, it seems the point at around 210 is not a maxima (the gradient doesnt reach zero). 随意查看该图,似乎在210处的点不是最大值(梯度未达到零)。 You can check this by replacing the if statement with the following if (y[i+1]>y[i] and y[i+1]>y[i+2]): 您可以通过将if语句替换为以下if (y[i+1]>y[i] and y[i+1]>y[i+2]):

Here comes my naive approach: 这是我幼稚的方法:

Step 1: find a list containing slopes, which is +1 if two consecutive y -values are increasing, -1 if decreasing and 0 if are the same: 步骤1:找到一个包含斜率的列表,如果两个连续的y在增加,则为+1 ;如果减少,则为-1如果相同,则为0

import numpy as np
slope = [np.sign(y[i]-y[i-1]) for i in range(1, len(y))]

Now if you print slope , it's gonna be just either 0,1,-1 which says about slopes between each two consecutive y points. 现在,如果您打印slope ,它将是0,1,-1 ,它表示每两个连续y点之间的斜率。

Step2: To find minimas and maximas , I wrote this code which evaluates if the slope changes or not. 第2 maximas 为了找到minimasmaximas ,我编写了这段代码来评估斜率是否发生变化。 If it changes from 1 to -1 the index will be saved as a maxima , otherwise as minima . 如果从1变为-1则索引将另存为maxima ,否则另存为minima

x_prev = slope[0]
optima_dic={'minima':[], 'maxima':[]}
for i in range(1, len(slope)):
    if slope[i]*x_prev==-1: #slope changed
        if x_prev==1: # slope changed from 1 to -1
            optima_dic['maxima'].append(i)
        else: # slope changed from -1 to 1
            optima_dic['minima'].append(i)
        x_prev=-x_prev

and if you print the results: 如果您打印结果:

print(optima_dic)

Output: 输出:

{'minima': [109, 237], 'maxima': [105, 197]}

Quick and dirty :) 又快又脏:)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM