[英]Sum specific cells in a Pandas dataframe
I have a long Pandas dataframe (10,000 by 5) and I need to take a sum over every 10 cells.我有一个很长的 Pandas 数据框(10,000 x 5),我需要对每 10 个单元格求和。 My table looks like this.
我的桌子看起来像这样。
I was hoping to get my code to look like this, but I'm getting errors.我希望让我的代码看起来像这样,但我遇到了错误。
for i in range(1, 10000):
if i % 10 == 0:
avg = df.iloc[i - 10 : i, 3].sum()
df.iloc[i, 4] = avg
Maybe there's a more Pythonic way to calculate and store averages?也许有一种更 Pythonic 的方法来计算和存储平均值?
Use GroupBy.transform
with sum for new column filled aggregate values, if need fill only last row add mask to DataFrame.loc
:将
GroupBy.transform
与 sum 用于新列填充的聚合值,如果只需要填充最后一行,则向DataFrame.loc
添加掩码:
np.random.seed(2020)
df = pd.DataFrame(np.random.randint(10, size=(10000, 4))).add_prefix('col')
a = df.index
#if not default index
#a = np.arange(len(df))
df['sum1'] = df.iloc[:, 3].groupby(a // 10).transform('sum')
df.loc[a % 10 == 9, 'sum2'] = df.iloc[:, 3].groupby(a // 10).transform('sum')
print (df.head(20))
col0 col1 col2 col3 sum1 sum2
0 0 8 3 6 44 NaN
1 3 3 7 8 44 NaN
2 0 0 8 9 44 NaN
3 3 7 2 3 44 NaN
4 6 5 0 4 44 NaN
5 8 6 4 1 44 NaN
6 1 5 9 5 44 NaN
7 6 6 6 5 44 NaN
8 4 6 4 2 44 NaN
9 3 4 7 1 44 44.0
10 4 9 3 2 40 NaN
11 0 9 1 2 40 NaN
12 7 1 0 2 40 NaN
13 8 8 5 6 40 NaN
14 3 3 0 0 40 NaN
15 4 6 6 8 40 NaN
16 9 9 9 5 40 NaN
17 1 9 0 1 40 NaN
18 7 5 0 7 40 NaN
19 1 3 7 7 40 40.0
groupby
the row//10
;按
row//10
groupby
row//10
; take the mean of each group.取每组的平均值。 Does that get you moving?
这能让你动起来吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.