![](/img/trans.png)
[英]Pandas groupby multiple columns with value_counts function
[英]Pandas GroupBy a single column and display multiple columns as value counts
示例df
retailer_dict = {
'id': [1, 2, 3, 1, 1, 3],
'gender': ['Men', 'Women', 'Men', 'Women', 'Men', 'Women'],
'category': ['western', 'formal', 'casual', 'western', 'formal', 'casual']
}
df = pd.DataFrame(retailer_dict); df
# Output
id gender category
0 1 Men western
1 2 Women formal
2 3 Men casual
3 1 Women western
4 1 Men formal
5 3 Women casual
我想按id分組並將每個元素的計數顯示為值。
到目前為止我嘗試過的:
df.groupby('id')['gender'].value_counts()
# Output
id gender
1 Men 2
Women 1
2 Women 1
3 Men 1
Women 1
Name: gender, dtype: int64
也:
df.groupby('id')['gender'].apply(list)
但我無法弄清楚如何為多列做同樣的事情。
例:
# gives AttributeError
df.groupby('id')[['gender', 'category']].value_counts()
# Provides unuseful output
df.groupby('id')[['gender', 'category']].apply(list)
# Output
id
1 [gender, category]
2 [gender, category]
3 [gender, category]
dtype: object
預期產出:
id gender category
1 {Men: 2, Women:1} {western: 2, formal:1}
2 {Women:1} {formal:1}
3 {Men: 1, Women:1} {casual: 2}
任何問題或進一步的建議都會有所幫助。
將GroupBy.agg
與value_counts
GroupBy.agg
使用並轉換為dict
:
print (df.groupby('id')['gender', 'category'].agg(lambda x: x.value_counts().to_dict()))
要么:
from collections import Counter
print (df.groupby('id')['gender', 'category'].agg(lambda x: Counter(x)))
gender category
id
1 {'Men': 2, 'Women': 1} {'western': 2, 'formal': 1}
2 {'Women': 1} {'formal': 1}
3 {'Women': 1, 'Men': 1} {'casual': 2}
如果需要再次使用列表填充新列,請使用agg
:
print (df.groupby('id')['gender', 'category'].agg(list))
gender category
id
1 [Men, Women, Men] [western, western, formal]
2 [Women] [formal]
3 [Men, Women] [casual, casual]
使用具有多列的value_counts
是有問題的,因為創建了具有兩列值的MultiIndex
第二級:
print (pd.concat([df.groupby('id')['gender'].value_counts(),
df.groupby('id')['category'].value_counts()]))
id gender
1 Men 2
Women 1
2 Women 1
3 Men 1
Women 1
1 western 2
formal 1
2 formal 1
3 casual 2
dtype: int64
如果我理解正確,你可以這樣做:
retailer_dict = {'id': [1, 2, 3, 1, 1, 3, 1, 2],
'gender': ['Men', 'Women', 'Men', 'Women', 'Men', 'Women', 'Men', 'Women'],
'category': ['western', 'formal', 'casual', 'western', 'formal', 'casual','western','formal']}
df = pd.DataFrame(retailer_dict)
df['counter'] = 1
group_data = df.groupby(['id', 'gender', 'category'])['counter'].sum()
print (group_data)
輸出:
id gender category
1 Men formal 1
western 2
Women western 1
2 Women formal 2
3 Men casual 1
Women casual 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.