I have a dataframe with user IDs, the result of users['user id'].nunique()
return the unique count of users. The users['user id'].value_counts()
return the count of each unique user id. Is there a way to combine the two where I want the number of user ids that appeared more than once (ie 2 or more)
any suggestions much appreciated
You could use a mask on the output of value_counts
:
>>> import pandas as pd
>>> d = {'user_id': ['Apple', 'Banana', 'Carrot', 'Carrot', 'Apple']}
>>> users = pd.DataFrame(data=d)
>>> users
user_id
0 Apple
1 Banana
2 Carrot
3 Carrot
4 Apple
>>> counts = users['user_id'].value_counts()
>>> counts
Carrot 2
Apple 2
Banana 1
Name: user_id, dtype: int64
>>> counts_greater_than_1 = counts[counts > 1]
>>> counts_greater_than_1
Carrot 2
Apple 2
Name: user_id, dtype: int64
>>> len(counts_greater_than_1)
2
There is other ways of getting the number of user ids that appeared more than once. You can use duplicated(keep=False)
to create a mask to see duplicated values so you can see the DataFrame with only values that appeared more than once with
mask=users['user id'].duplicated(keep=False)
print(users[mask])
now if you want to know how many they are in total you can do users[mask].count()
but you can also check how many times each of then repeats with df[mask].groupby(by='user id').count()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.