[英]In pandas, how do you groupby two columns and sum a third distinct column?
Suppose you have the below data frame called raw
:假设您有以下名为
raw
的数据框:
A B C
Index
2007-01-03 NaN -1 1
2007-01-04 0.021954 1 2
2007-01-05 -0.007147 -1 3
2007-01-08 0.004926 1 4
2007-01-09 0.079800 1 4
2007-01-10 0.046746 1 4
2007-01-11 -0.012448 -1 5
2007-01-12 -0.012394 -1 5
2007-01-16 0.025873 1 6
2007-01-17 -0.022391 -1 7
2007-01-18 -0.063928 -1 7
2007-01-19 -0.006420 -1 7
2007-01-22 -0.019511 -1 7
2007-01-23 -0.012639 -1 7
2007-01-24 0.011601 1 8
2007-01-25 -0.005204 -1 9
2007-01-26 -0.010138 -1 9
2007-01-29 0.006538 1 10
2007-01-30 -0.004549 -1 11
2007-01-31 0.002102 1 12
How can I groupby columns B & the count of column C while also summing column A?如何对 B 列和 C 列的计数进行分组,同时对 A 列求和? Am able to produce the former via
raw[['B','C']].value_counts()
(grouping by column B and count of C) but unable to combine this result with the latter (sum of column A).我能够通过
raw[['B','C']].value_counts()
(按 B 列和 C 的计数分组)产生前者,但无法将此结果与后者(A 列的总和)结合起来。
raw[['B','C']].value_counts()
B C
-1 7 5
1 4 3
-1 5 2
9 2
1 1
3 1
11 1
1 2 1
6 1
8 1
10 1
12 1
dtype: int64
I think you want to group by B,C
:我想你想按
B,C
:
df.groupby(['B','C']).agg({'C':'count', 'A':'sum'})
Output: Output:
C A
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102
Or better yet with named agg, which allows you to rename the new columns:或者使用命名 agg 更好,它允许您重命名新列:
df.groupby(['B','C']).agg(C_count=('C','count'),
A_sum=('A','sum'))
Output: Output:
C_count A_sum
B C
-1 1 1 0.000000
3 1 -0.007147
5 2 -0.024842
7 5 -0.124889
9 2 -0.015342
11 1 -0.004549
1 2 1 0.021954
4 3 0.131472
6 1 0.025873
8 1 0.011601
10 1 0.006538
12 1 0.002102
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.