Groupby计算Set中唯一值的频率

Question

I have a DF like this 我有这样的DF

   User      Dept     
    1        Cook
    1        Cook
    1        Home
    2        Sports
    2        Travel
    2        Cook

I want to count the unique users within each department: 我想计算每个部门内的独特用户：

   Dept      User
   Cook      2
   Home      1
   Sports    1
   Travel    1

Notice how the department Cook only has a count of two because even though three users were found in 'Cook', there were only two unique users 注意Cook部门只有两个，因为即使在'Cook'中找到三个用户，也只有两个唯一用户

I have tried the following: 我尝试过以下方法：

 df.groupby(['Dept']).count()  -- counts 'Cook' three times
 df.drop_duplicates(['Dept']).groupby('Dept')['User'].sum() -- over counts all departments

I know the answer is a groupby, I just can't seem to figure it out! 我知道答案是一个群体，我似乎无法弄明白！

Answer 1

You could use nunique : 你可以使用nunique ：

>>> df.groupby("Dept")["User"].nunique()
Dept
Cook      2
Home      1
Sports    1
Travel    1
Name: User, dtype: int64
>>> df.groupby("Dept")["User"].nunique().reset_index()
     Dept  User
0    Cook     2
1    Home     1
2  Sports     1
3  Travel     1

(Note that I used your example data, which only has one unique user in Sports.) （请注意，我使用了您的示例数据，在Sports中只有一个唯一用户。）

Groupby计算Set中唯一值的频率

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-10-13 00:05:09

Groupby计算Set中唯一值的频率

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-10-13 00:05:09

解决方案1
3 已采纳 2015-10-13 00:05:09