简体   繁体   中英

Pandas nunique() but only return value.counts() > 1

I have a dataframe with user IDs, the result of users['user id'].nunique() return the unique count of users. The users['user id'].value_counts() return the count of each unique user id. Is there a way to combine the two where I want the number of user ids that appeared more than once (ie 2 or more)

any suggestions much appreciated

You could use a mask on the output of value_counts :

>>> import pandas as pd
>>> d = {'user_id': ['Apple', 'Banana', 'Carrot', 'Carrot', 'Apple']}
>>> users = pd.DataFrame(data=d)
>>> users
  user_id
0   Apple
1  Banana
2  Carrot
3  Carrot
4   Apple
>>> counts = users['user_id'].value_counts()
>>> counts
Carrot    2
Apple     2
Banana    1
Name: user_id, dtype: int64
>>> counts_greater_than_1 = counts[counts > 1]
>>> counts_greater_than_1
Carrot    2
Apple     2
Name: user_id, dtype: int64
>>> len(counts_greater_than_1)
2

There is other ways of getting the number of user ids that appeared more than once. You can use duplicated(keep=False) to create a mask to see duplicated values so you can see the DataFrame with only values that appeared more than once with

mask=users['user id'].duplicated(keep=False)
print(users[mask])

now if you want to know how many they are in total you can do users[mask].count() but you can also check how many times each of then repeats with df[mask].groupby(by='user id').count()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM