[英]Average of every x rows with a step size of y per each subset using pandas
I have a pandas data frame like this:我有一个像这样的 pandas 数据框:
Subset Position Value
1 1 2
1 10 3
1 15 0.285714
1 43 1
1 48 0
1 89 2
1 132 2
1 152 0.285714
1 189 0.133333
1 200 0
2 1 0.133333
2 10 0
2 15 2
2 33 2
2 36 0.285714
2 72 2
2 132 0.133333
2 152 0.133333
2 220 3
2 250 8
2 350 6
2 750 0
I want to know how can I get the mean of values for every "x" row with "y" step size per subset in pandas?我想知道如何获得 pandas 中每个子集的“y”步长的每个“x”行的平均值?
For example, mean of every 5 rows (step size =2) for value column in each subset like this:例如,每个子集中的 value 列每 5 行的平均值(步长 = 2),如下所示:
Subset Start_position End_position Mean
1 1 48 1.2571428
1 15 132 1.0571428
1 48 189 0.8838094
2 1 36 0.8838094
2 15 132 1.2838094
2 36 220 1.110476
2 132 350 3.4533332
Is this what you were looking for:这是你要找的东西:
df = pd.DataFrame({'Subset': [1]*10+[2]*12,
'Position': [1,10,15,43,48,89,132,152,189,200,1,10,15,33,36,72,132,152,220,250,350,750],
'Value': [2,3,.285714,1,0,2,2,.285714,.1333333,0,0.133333,0,2,2,.285714,2,.133333,.133333,3,8,6,0]})
averaged_df = pd.DataFrame(columns=['Subset', 'Start_position', 'End_position', 'Mean'])
window = 5
step_size = 2
for subset in df.Subset.unique():
subset_df = df[df.Subset==subset].reset_index(drop=True)
for i in range(0,len(df),step_size):
window_rows = subset_df.iloc[i:i+window]
if len(window_rows) < window:
continue
window_average = {'Subset': window_rows.Subset.loc[0+i],
'Start_position': window_rows.Position[0+i],
'End_position': window_rows.Position.iloc[-1],
'Mean': window_rows.Value.mean()}
averaged_df = averaged_df.append(window_average,ignore_index=True)
Some notes about the code:关于代码的一些注释:
1,1,2,1,2,2
will behave as if it was 1,1,1,2,2,2
)1,1,2,1,2,2
将表现得好像它是1,1,1,2,2,2
)1, 132, 200, 0,60476
is not included`)1, 132, 200, 0,60476
)One version specific answer would be, using pandas.api.indexers.FixedForwardWindowIndexer
introduced in pandas 1.1.0
:一个特定版本的答案是,使用
pandas.api.indexers.FixedForwardWindowIndexer
pandas 1.1.0
中引入的 pandas.api.indexers.FixedForwardWindowIndexer :
>>> window=5
>>> step=2
>>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=window)
>>> df2 = df.join(df.Position.shift(-(window-1)), lsuffix='_start', rsuffix='_end')
>>> df2 = df2.assign(Mean=df2.pop('Value').rolling(window=indexer).mean()).iloc[::step]
>>> df2 = df2[df2.Position_start.lt(df2.Position_end)].dropna()
>>> df2['Position_end'] = df2['Position_end'].astype(int)
>>> df2
Subset Position_start Position_end Mean
0 1 1 48 1.257143
2 1 15 132 1.057143
4 1 48 189 0.883809
10 2 1 36 0.883809
12 2 15 132 1.283809
14 2 36 220 1.110476
16 2 132 350 3.453333
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.