[英]groupby a column and count items above 5 in another pandas
所以我有一個這樣的df:
NAME TRY SCORE
Bob 1st 3
Sue 1st 7
Tom 1st 3
Max 1st 8
Jay 1st 4
Mel 1st 7
Bob 2nd 4
Sue 2nd 2
Tom 2nd 6
Max 2nd 4
Jay 2nd 7
Mel 2nd 8
Bob 3rd 3
Sue 3rd 5
Tom 3rd 6
Max 3rd 3
Jay 3rd 4
Mel 3rd 6
我想算一下每個人得分超過5的haw mant時間?
進入一個看起來像這樣的新df2:
NAME COUNT
Bob 0
Sue 1
Tom 2
Mary 1
Jay 1
Mel 3
我的嘗試很多 - 這是最新的
df2 = df.groupby('NAME')[['SCORE'] > 5].count().reset_index(name="count")
首先創建布爾掩碼然后按sum
aggregate
- True
值s是像1
這樣的過程:
df2 = (df['SCORE'] > 5).groupby(df['NAME']).sum().astype(int).reset_index(name="count")
print (df2)
NAME count
0 Bob 0
1 Jay 1
2 Max 1
3 Mel 3
4 Sue 1
5 Tom 2
細節 :
print (df['SCORE'] > 5)
0 False
1 True
2 False
3 True
4 False
5 True
6 False
7 False
8 True
9 False
10 True
11 True
12 False
13 False
14 True
15 False
16 False
17 True
Name: SCORE, dtype: bool
只使用groupby
和sum
df.assign(SCORE=df.SCORE.gt(5)).groupby('NAME')['SCORE'].sum().astype(int).reset_index()
Out[524]:
NAME SCORE
0 Bob 0
1 Jay 1
2 Max 1
3 Mel 3
4 Sue 1
5 Tom 2
或者我們使用set_index
和sum
df.set_index('NAME').SCORE.gt(5).sum(level=0).astype(int)
一種方法是編寫一個自定義groupby函數,你可以在其中獲取每個組的分數,並總結大於5的那些,如下所示:
df.groupby('NAME')['SCORE'].agg(lambda x: (x > 5).sum())
NAME
Bob 0
Jay 1
Max 1
Mel 3
Sue 1
Tom 2
Name: SCORE, dtype: int64
如果要將count作為字典,可以使用collections.Counter
:
from collections import Counter
c = Counter(df.loc[df['SCORE'] > 5, 'NAME'])
對於數據框,您可以映射來自唯一名稱的計數:
res = pd.DataFrame({'NAME': df['NAME'].unique(), 'COUNT': 0})
res['COUNT'] = res['NAME'].map(c).fillna(0).astype(int)
print(res)
COUNT NAME
0 0 Bob
1 1 Sue
2 2 Tom
3 1 Max
4 1 Jay
5 3 Mel
首先過濾數據幀,然后使用聚合和重新索引進行groupby以填充缺失值。
df[df['SCORE'] > 5].groupby('NAME')['SCORE'].size()\
.reindex(df['NAME'].unique(), fill_value=0)
輸出:
NAME
Bob 0
Sue 1
Tom 2
Max 1
Jay 1
Mel 3
Name: SCORE, dtype: int64
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.