简体   繁体   English

如何按唯一值 pandas groupby

[英]How to group by unique values pandas groupby

Before I ask my question, I want it to be known that I looked at the following page but it did not return what I need specifically:在我提出问题之前,我希望知道我查看了以下页面,但它没有返回我具体需要的内容:

Count unique values using pandas groupby 使用 pandas groupby 计算唯一值

Let's say I have the following df of four individuals trying to guess a code.假设我有以下四个人试图猜测代码的 df。 Each individual has two guesses:每个人有两个猜测:

df = pd.DataFrame({'name':['Sally', 'John', 'Lucy', 'Mark','Sally', 'John', 'Lucy', 'Mark'], 
                   'guess':['123', '321', '213', '312', '321', '231', '123', '312']})

df

    name    guess
0   Sally   123
1   John    321
2   Lucy    213
3   Mark    312
4   Sally   321
5   John    231
6   Lucy    123
7   Mark    312

I want to know how many completely unique guesses each individual has.我想知道每个人有多少完全独特的猜测。 That is, I don't want to know how many unique guesses each individual has out of their own guesses, rather, I want to know how many unique guesses they have out of all guesses.也就是说,我不想知道每个人在他们自己的猜测中有多少独特的猜测,而是我想知道他们在所有猜测中有多少独特的猜测。 Let me elaborate.让我详细说明。

Using the code from the post linked above, this is what I get:使用上面链接的帖子中的代码,这就是我得到的:

df.groupby('name')[['guess']].nunique()


      guess
name    
John    2
Lucy    2
Mark    1
Sally   2

This returns how many unique guesses each individual has when compared to their own guesses.这将返回每个人与他们自己的猜测相比有多少独特的猜测。 Again, what I am looking for is how many unique guesses each individual has out of all total guesses (aka the entire coulmn).同样,我要寻找的是每个人在所有总猜测(也就是整个库)中有多少个独特的猜测。 This is the output I am looking for:这是我正在寻找的 output:

      guess count
name    
John    1     2
Lucy    1     2
Mark    0     2
Sally   0     2

Because one of John's guesses (231) and one of Lucy's guesses (213) are unique out of all guesses.因为约翰的猜测之一 (231) 和露西的猜测之一 (213) 在所有猜测中是唯一的。 It would also be nice to have a column showing each individuals total guess count.有一列显示每个人的总猜测数也很好。

Thank you in advance!先感谢您!

You can first find out which guesses were unique by grouping by guess , then just doing a grouped count and sum on name afterwards gives you the final output:您可以首先通过按guess分组找出哪些猜测是唯一的,然后只需对name进行分组计数和总和即可得到最终的output:

In [64]: df['unique'] = df['guess'].map(df.groupby("guess").count()['name'] == 1).astype(int)

In [65]: df.groupby("name")['unique'].agg(['sum', 'count']).rename(columns={'sum': 'guess'})
Out[65]:
       guess  count
name
John     1      2
Lucy     1      2
Mark     0      2
Sally    0      2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM