使用 pandas 每个子集的步长为 y 的每 x 行的平均值

Question

I have a pandas data frame like this:我有一个像这样的 pandas 数据框：

Subset  Position    Value
1   1   2
1   10  3
1   15  0.285714
1   43  1
1   48  0
1   89  2
1   132 2
1   152 0.285714
1   189 0.133333
1   200 0
2   1   0.133333
2   10  0
2   15  2
2   33  2
2   36  0.285714
2   72  2
2   132 0.133333
2   152 0.133333
2   220 3
2   250 8
2   350 6
2   750 0

I want to know how can I get the mean of values for every "x" row with "y" step size per subset in pandas?我想知道如何获得 pandas 中每个子集的“y”步长的每个“x”行的平均值？

For example, mean of every 5 rows (step size =2) for value column in each subset like this:例如，每个子集中的 value 列每 5 行的平均值（步长 = 2），如下所示：

Subset  Start_position  End_position    Mean
1   1   48  1.2571428
1   15  132 1.0571428
1   48  189 0.8838094
2   1   36  0.8838094
2   15  132 1.2838094
2   36  220 1.110476
2   132 350 3.4533332

Answer 1

Is this what you were looking for:这是你要找的东西：

df = pd.DataFrame({'Subset': [1]*10+[2]*12,
                   'Position': [1,10,15,43,48,89,132,152,189,200,1,10,15,33,36,72,132,152,220,250,350,750],
                   'Value': [2,3,.285714,1,0,2,2,.285714,.1333333,0,0.133333,0,2,2,.285714,2,.133333,.133333,3,8,6,0]})

averaged_df = pd.DataFrame(columns=['Subset', 'Start_position', 'End_position', 'Mean'])

window = 5
step_size = 2
for subset in df.Subset.unique():
    subset_df = df[df.Subset==subset].reset_index(drop=True)
    for i in range(0,len(df),step_size):
        window_rows = subset_df.iloc[i:i+window]
        if len(window_rows) < window:
            continue
            
        window_average = {'Subset': window_rows.Subset.loc[0+i],
                          'Start_position': window_rows.Position[0+i],
                          'End_position': window_rows.Position.iloc[-1],
                          'Mean': window_rows.Value.mean()}
        averaged_df = averaged_df.append(window_average,ignore_index=True)

Some notes about the code:关于代码的一些注释：

It assumes all subsets are in order in the original df ( 1,1,2,1,2,2 will behave as if it was 1,1,1,2,2,2 )它假设所有子集在原始 df 中都是有序的（ 1,1,2,1,2,2将表现得好像它是1,1,1,2,2,2 ）
If there is a group left that's smaller than a window, it will skip it (eg 1, 132, 200, 0,60476 is not included`)如果剩下的组小于 window，它将跳过它（例如1, 132, 200, 0,60476 ）

Answer 2

One version specific answer would be, using pandas.api.indexers.FixedForwardWindowIndexer introduced in pandas 1.1.0 :一个特定版本的答案是，使用pandas.api.indexers.FixedForwardWindowIndexer pandas 1.1.0中引入的 pandas.api.indexers.FixedForwardWindowIndexer ：

>>> window=5
>>> step=2
>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=window)
>>> df2 = df.join(df.Position.shift(-(window-1)), lsuffix='_start', rsuffix='_end')
>>> df2 = df2.assign(Mean=df2.pop('Value').rolling(window=indexer).mean()).iloc[::step]
>>> df2 = df2[df2.Position_start.lt(df2.Position_end)].dropna()
>>> df2['Position_end'] = df2['Position_end'].astype(int)

>>> df2

    Subset  Position_start  Position_end      Mean
0        1               1            48  1.257143
2        1              15           132  1.057143
4        1              48           189  0.883809
10       2               1            36  0.883809
12       2              15           132  1.283809
14       2              36           220  1.110476
16       2             132           350  3.453333

使用 pandas 每个子集的步长为 y 的每 x 行的平均值

问题描述

2 个解决方案

解决方案1
2 已采纳 2021-02-24 23:12:03

解决方案2
2 2021-02-24 23:23:04

使用 pandas 每个子集的步长为 y 的每 x 行的平均值

问题描述

2 个解决方案

解决方案1 2 已采纳 2021-02-24 23:12:03

解决方案2 2 2021-02-24 23:23:04

解决方案1
2 已采纳 2021-02-24 23:12:03

解决方案2
2 2021-02-24 23:23:04