简体   繁体   English

Groupby 多列使用 python pandas 查找一列的唯一计数

[英]Groupby multiple column to find the unique count of one column using python pandas

I have dataframe like:我有 dataframe 像:

column1    column2    column3
 ram        tall        good
 rohan      short       fine
 ajay       tall        best
 alia       tall        good
 aman       medium      fine
 john       short       good
 jack       short       fine

now i need output like:现在我需要 output 像:

unique count of good in tall, short, medium on basis of column1->基于 column1-> 的高、短、中的唯一计数

tall=2 , short=1 , medium=0

unique count of fine in tall, short, medium on basis of column1->基于 column1-> 的高、短、中的唯一罚款计数

tall=0 , short=2 , medium=1

unique count of best in tall, short, medium on basis of column1->基于 column1-> 的高、短、中最佳的唯一计数

tall=1 , short=0 , medium=0

I am beginner in pandas.我是 pandas 的初学者。 Thanks in advance提前致谢

Let's try pd.crosstab :让我们试试pd.crosstab

pd.crosstab(df['column3'], df['column2'])

column2  medium  short  tall
column3                     
best          0      0     1
fine          1      2     0
good          0      1     2

Use value_counts + unstack使用value_counts + unstack

res = df[['column3', 'column2']].value_counts().unstack('column2', fill_value=0)
print(res)

Output Output

column2  medium  short  tall
column3                     
best          0      0     1
fine          1      2     0
good          0      1     2

As an alternative groupby + unstack :作为替代groupby + unstack

res = df.groupby(['column3', 'column2']).count().unstack('column2', fill_value=0)
print(res)

Output (groupby) Output (groupby)

        column1           
column2  medium short tall
column3                   
best          0     0    1
fine          1     2    0
good          0     1    2

The idea behind both approaches is to create an index and then unstack it.这两种方法背后的想法是创建一个索引,然后将其拆开。 If you want to match the same order as specify in your question, convert to Categorical first:如果您想匹配问题中指定的相同顺序,请先转换为分类:

df['column2'] = pd.Categorical(df['column2'], categories=['tall', 'short', 'medium'], ordered=True)
res = df[['column3', 'column2']].value_counts().unstack('column2', fill_value=0)
print(res) 

Output Output

column2  tall  short  medium
column3                     
best        1      0       0
fine        0      2       1
good        2      1       0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM