Pandas nunique() but only return value.counts() > 1

Question

I have a dataframe with user IDs, the result of users['user id'].nunique() return the unique count of users. The users['user id'].value_counts() return the count of each unique user id. Is there a way to combine the two where I want the number of user ids that appeared more than once (ie 2 or more)

any suggestions much appreciated

Answer 1

You could use a mask on the output of value_counts :

>>> import pandas as pd
>>> d = {'user_id': ['Apple', 'Banana', 'Carrot', 'Carrot', 'Apple']}
>>> users = pd.DataFrame(data=d)
>>> users
  user_id
0   Apple
1  Banana
2  Carrot
3  Carrot
4   Apple
>>> counts = users['user_id'].value_counts()
>>> counts
Carrot    2
Apple     2
Banana    1
Name: user_id, dtype: int64
>>> counts_greater_than_1 = counts[counts > 1]
>>> counts_greater_than_1
Carrot    2
Apple     2
Name: user_id, dtype: int64
>>> len(counts_greater_than_1)
2

Answer 2

There is other ways of getting the number of user ids that appeared more than once. You can use duplicated(keep=False) to create a mask to see duplicated values so you can see the DataFrame with only values that appeared more than once with

mask=users['user id'].duplicated(keep=False)
print(users[mask])

now if you want to know how many they are in total you can do users[mask].count() but you can also check how many times each of then repeats with df[mask].groupby(by='user id').count()

Pandas nunique() but only return value.counts() > 1

Question

2 answers

solution1
0 2020-11-13 01:22:05

solution2
0 2020-11-13 01:30:26

Pandas nunique() but only return value.counts() > 1

Question

2 answers

solution1 0 2020-11-13 01:22:05

solution2 0 2020-11-13 01:30:26

solution1
0 2020-11-13 01:22:05

solution2
0 2020-11-13 01:30:26