[英]Pandas: How to group by one column and show count for unique values for all other columns per group?
( Data sample and attempts at the end of the question ) (问题末尾的数据样本和尝试)
With a dataframe such as this:使用这样的数据框:
Type Class Area Decision
0 A 1 North Yes
1 B 1 North Yes
2 C 2 South No
3 A 3 South No
4 B 3 South No
5 C 1 South No
6 A 2 North Yes
7 B 3 South Yes
8 B 1 North No
How can I group by Decision
and get a count of Decision
for unique values under the other columns so that I end up with this:如何按
Decision
分组并获取其他列下唯一值的Decision
计数,以便我最终得到:
Decision Area_North Aread_South Class_1 Class_2 Type_A Type_B Type_C
Yes 3 1 2 0 2 2 1
No 1 4 1 1 1 2 2
I was sure I could get a good start using groupby().agg()
like this:我确信我可以像这样使用
groupby().agg()
有一个好的开始:
dfg = df.groupby('Decision').agg({'Type':'count',
'Class':'count',
'Decision':'count'})
And then pivot the result, but it's not enough by far.然后旋转结果,但到目前为止还不够。 I'll need to include the unique values of all other columns somehow.
我需要以某种方式包含所有其他列的唯一值。 I was sure I've seen somwehere that you could replace
'Position':'count'
with 'Position':pd.Series.unique
, but I can't seem to get it to work.我确信我在某些地方看到过,您可以将
'Position':'count'
替换为'Position':pd.Series.unique
,但我似乎无法让它发挥作用。
Code:代码:
import pandas as pd
df = pd.DataFrame({'Type': {0: 'A',
1: 'B',
2: 'C',
3: 'A',
4: 'B',
5: 'C',
6: 'A',
7: 'B',
8: 'B'},
'Class': {0: 1, 1: 1, 2: 2, 3: 3, 4: 3, 5: 1, 6: 2, 7: 3, 8: 1},
'Area': {0: 'North',
1: 'North',
2: 'South',
3: 'South',
4: 'South',
5: 'South',
6: 'North',
7: 'South',
8: 'North'},
'Decision': {0: 'Yes',
1: 'Yes',
2: 'No',
3: 'No',
4: 'No',
5: 'No',
6: 'Yes',
7: 'Yes',
8: 'No'}})
dfg = df.groupby('Decision').agg({'Type':'count',
'Class':'count',
'Decision':'count'})
dfg
Use DataFrame.melt
with DataFrame.pivot_table
and flatten MultiIndex
:将
DataFrame.melt
与DataFrame.pivot_table
DataFrame.melt
使用并展平MultiIndex
:
df = df.melt('Decision').pivot_table(index='Decision',
columns=['variable','value'],
aggfunc='size',
fill_value=0)
df.columns = df.columns.map('{0[0]}_{0[1]}'.format)
df = df.reset_index()
print (df)
Decision Area_North Area_South Class_1 Class_2 Class_3 Type_A Type_B \
0 No 1 4 2 1 2 1 2
1 Yes 3 1 2 1 1 2 2
Type_C
0 2
1 0
melt
with groupby
+ value_counts
与
groupby
+ value_counts
melt
s=df.melt('Decision').groupby(['Decision','variable']).\
value.value_counts().unstack(level=[1,2],fill_value=0)
variable Area Class Type
value South North 1 3 2 B C A
Decision
No 4 1 2 2 1 2 2 1
Yes 1 3 2 1 1 2 0 2
You can also modify above columns by您还可以通过以下方式修改上述列
s.columns = s.columns.map('{0[0]}_{0[1]}'.format)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.