简体   繁体   English

从具有特定列的最接近值的数据框中选择/分组行

[英]Select/Group rows from a data frame with the nearest values for a specific column(s)

I have the two columns in a data frame (you can see a sample down below) Usually in columns A & BI get 10 to 12 rows with similar values.我在数据框中有两列(您可以在下面看到一个示例)通常在 A 和 BI 列中获得 10 到 12 行具有相似值的行。 So for example: from index 1 to 10 and then from index 11 to 21. I would like to group these values and get the mean and standard deviation of each group.例如:从索引 1 到 10,然后从索引 11 到 21。我想对这些值进行分组并获得每组的平均值和标准差。 I found this following line code where I can get the index of the nearest value.我找到了以下行代码,我可以在其中获取最接近值的索引。 but I don't know how to do this repetitively:但我不知道如何重复执行此操作:

Index = df['A'].sub(df['A'][0]).abs().idxmin()

Anyone has any ideas on how to approach this problem?有人对如何解决这个问题有任何想法吗?

       A                    B
1   3652.194531     -1859.805238
2   3739.026566     -1881.965576
3   3742.095325     -1878.707674
4   3747.016899     -1878.728626
5   3746.214554     -1881.270329
6   3750.325368     -1882.915532
7   3748.086576     -1882.406672
8   3751.786422     -1886.489485
9   3755.448968     -1885.695822
10  3753.714126     -1883.504098
11  -337.969554     24.070990
12  -343.019575     23.438956
13  -344.788697     22.250254
14  -346.433460     21.912217
15  -343.228579     22.178519
16  -345.722368     23.037441
17  -345.923108     23.317620
18  -345.526633     21.416528
19  -347.555162     21.315934
20  -347.229210     21.565183
21  -344.575181     22.963298
22  23.611677   -8.499528
23  26.320500   -8.744512
24  24.374874   -10.717384
25  25.885272   -8.982414
26  24.448127   -9.002646
27  23.808744   -9.568390
28  24.717935   -8.491659
29  25.811393   -8.773649
30  25.084683   -8.245354
31  25.345618   -7.508419
32  23.286342   -10.695104
33  -3184.426285    -2533.374402
34  -3209.584366    -2553.310934
35  -3210.898611    -2555.938332
36  -3214.234899    -2558.244347
37  -3216.453616    -2561.863807
38  -3219.326197    -2558.739058
39  -3214.893325    -2560.505207
40  -3194.421934    -2550.186647
41  -3219.728445    -2562.472566
42  -3217.630380    -2562.132186
43  234.800448  -75.157523
44  236.661235  -72.617806
45  238.300501  -71.963103
46  239.127539  -72.797922
47  232.305335  -70.634125
48  238.452197  -73.914015
49  239.091210  -71.035163
50  239.855953  -73.961841
51  238.936811  -73.887023
52  238.621490  -73.171441
53  240.771812  -73.847028
54  -16.798565  4.421919
55  -15.952454  3.911043
56  -14.337879  4.236691
57  -17.465204  3.610884
58  -17.270147  4.407737
59  -15.347879  3.256489
60  -18.197750  3.906086

A simpler approach consist in grouping the values where the percentage change is not greater than a given threshold (let's say 0.5):一种更简单的方法是将百分比变化不大于给定阈值(例如 0.5)的值分组:

df['Group'] = (df.A.pct_change().abs()>0.5).cumsum()
df.groupby('Group').agg(['mean', 'std'])

Output: Output:

                 A                       B          
              mean        std         mean       std
Group                                               
0      3738.590934  30.769420 -1880.148905  7.582856
1      -344.724684   2.666137    22.496995  0.921008
2        24.790470   0.994361    -9.020824  0.977809
3     -3210.159806  11.646589 -2555.676749  8.810481
4       237.902230   2.439297   -72.998817  1.366350
5       -16.481411   1.341379     3.964407  0.430576

Note: I have only used the "A" column, since the "B" column appears to follow the same pattern of consecutive nearest values.注意:我只使用了“A”列,因为“B”列似乎遵循相同的连续最近值模式。 You can check if the identified groups are the same between columns with:您可以通过以下方式检查列之间识别的组是否相同:

grps = (df[['A','B']].pct_change().abs()>1).cumsum()
grps.A.eq(grps.B).all()

I would say that if you know the length of each group/index set you want then you can first subset the column and row with:我会说,如果您知道所需的每个组/索引集的长度,那么您可以首先将列和行设置为子集:

    df['A'].iloc[0:11].mean()

Then figure out a way to find standard deviation.然后想办法找到标准偏差。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 select 来自大数据帧的特定行 - select specific rows from a large data frame 将一组数据框行的列值转换为列中的列表 - Convert column values for a group of data frame rows into a list in the column 在 Pandas 的特定列上用值替换数据框的某些行 - Replace certain rows of data frame with values, on specific column, in Pandas 如何 select 数据框中的行在列值的基础上是相似的 - how to select rows in a data frame those are similar on basis of column values 基于所有列值的数据框中的 Select 行? - Select rows in data frame based all column values? 从 Pandas 数据框中选择多行,其中一列包含一些作为 NaN 的值 - Select multiple rows from pandas data frame where one of column contains some values as NaN 如何根据列值从 python H2O 数据框中选择行? - How to select rows from a python H2O data frame based on column values? 如果列的字符串值包含特定模式,如何从 pandas 数据帧中提取整行 - How to extract entire rows from pandas data frame, if a column's string value contains a specific pattern Pandas select 行基于从特定列中随机选择的组 - Pandas select rows based on randomly selected group from a specific column 如何将数据框查询到最接近的列的最低或最高值 - How to query a data-frame to nearest lowest or highest values of column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM