从具有特定列的最接近值的数据框中选择/分组行

Question

I have the two columns in a data frame (you can see a sample down below) Usually in columns A & BI get 10 to 12 rows with similar values.我在数据框中有两列（您可以在下面看到一个示例）通常在 A 和 BI 列中获得 10 到 12 行具有相似值的行。 So for example: from index 1 to 10 and then from index 11 to 21. I would like to group these values and get the mean and standard deviation of each group.例如：从索引 1 到 10，然后从索引 11 到 21。我想对这些值进行分组并获得每组的平均值和标准差。 I found this following line code where I can get the index of the nearest value.我找到了以下行代码，我可以在其中获取最接近值的索引。 but I don't know how to do this repetitively:但我不知道如何重复执行此操作：

Index = df['A'].sub(df['A'][0]).abs().idxmin()

Anyone has any ideas on how to approach this problem?有人对如何解决这个问题有任何想法吗？

       A                    B
1   3652.194531     -1859.805238
2   3739.026566     -1881.965576
3   3742.095325     -1878.707674
4   3747.016899     -1878.728626
5   3746.214554     -1881.270329
6   3750.325368     -1882.915532
7   3748.086576     -1882.406672
8   3751.786422     -1886.489485
9   3755.448968     -1885.695822
10  3753.714126     -1883.504098
11  -337.969554     24.070990
12  -343.019575     23.438956
13  -344.788697     22.250254
14  -346.433460     21.912217
15  -343.228579     22.178519
16  -345.722368     23.037441
17  -345.923108     23.317620
18  -345.526633     21.416528
19  -347.555162     21.315934
20  -347.229210     21.565183
21  -344.575181     22.963298
22  23.611677   -8.499528
23  26.320500   -8.744512
24  24.374874   -10.717384
25  25.885272   -8.982414
26  24.448127   -9.002646
27  23.808744   -9.568390
28  24.717935   -8.491659
29  25.811393   -8.773649
30  25.084683   -8.245354
31  25.345618   -7.508419
32  23.286342   -10.695104
33  -3184.426285    -2533.374402
34  -3209.584366    -2553.310934
35  -3210.898611    -2555.938332
36  -3214.234899    -2558.244347
37  -3216.453616    -2561.863807
38  -3219.326197    -2558.739058
39  -3214.893325    -2560.505207
40  -3194.421934    -2550.186647
41  -3219.728445    -2562.472566
42  -3217.630380    -2562.132186
43  234.800448  -75.157523
44  236.661235  -72.617806
45  238.300501  -71.963103
46  239.127539  -72.797922
47  232.305335  -70.634125
48  238.452197  -73.914015
49  239.091210  -71.035163
50  239.855953  -73.961841
51  238.936811  -73.887023
52  238.621490  -73.171441
53  240.771812  -73.847028
54  -16.798565  4.421919
55  -15.952454  3.911043
56  -14.337879  4.236691
57  -17.465204  3.610884
58  -17.270147  4.407737
59  -15.347879  3.256489
60  -18.197750  3.906086

Answer 1

A simpler approach consist in grouping the values where the percentage change is not greater than a given threshold (let's say 0.5):一种更简单的方法是将百分比变化不大于给定阈值（例如 0.5）的值分组：

df['Group'] = (df.A.pct_change().abs()>0.5).cumsum()
df.groupby('Group').agg(['mean', 'std'])

Output: Output：

                 A                       B          
              mean        std         mean       std
Group                                               
0      3738.590934  30.769420 -1880.148905  7.582856
1      -344.724684   2.666137    22.496995  0.921008
2        24.790470   0.994361    -9.020824  0.977809
3     -3210.159806  11.646589 -2555.676749  8.810481
4       237.902230   2.439297   -72.998817  1.366350
5       -16.481411   1.341379     3.964407  0.430576

Note: I have only used the "A" column, since the "B" column appears to follow the same pattern of consecutive nearest values.注意：我只使用了“A”列，因为“B”列似乎遵循相同的连续最近值模式。 You can check if the identified groups are the same between columns with:您可以通过以下方式检查列之间识别的组是否相同：

grps = (df[['A','B']].pct_change().abs()>1).cumsum()
grps.A.eq(grps.B).all()

Answer 2

I would say that if you know the length of each group/index set you want then you can first subset the column and row with:我会说，如果您知道所需的每个组/索引集的长度，那么您可以首先将列和行设置为子集：

    df['A'].iloc[0:11].mean()

Then figure out a way to find standard deviation.然后想办法找到标准偏差。

从具有特定列的最接近值的数据框中选择/分组行

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-12-16 21:20:11

解决方案2
0 2020-12-16 16:53:52

从具有特定列的最接近值的数据框中选择/分组行

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-12-16 21:20:11

解决方案2 0 2020-12-16 16:53:52

解决方案1
1 已采纳 2020-12-16 21:20:11

解决方案2
0 2020-12-16 16:53:52