How to sort a pandas data frame by value counts of a column?

Question

I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts() .

import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1

The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order.

I was thinking about using df.groupby , merging df['user_id'].value_counts() with the data frame, and also converting df['user_id'] to ordered categorical data. However, none of these approaches seemed particularly elegant.

Thanks in advance for any help!

Answer 1

`transform` and `argsort`

Use kind='mergesort' for stability

df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]

`factorize` , `bincount` , and `argsort`

Use kind='mergesort' for stability

i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]

Response to Comments

Thank you @piRSquared. Is it possible to reverse the sort order, though? value_counts is in descending order. In the example, user 2 has 90 rows and user 1 has 10 rows. I'd like user 2's rows to come first. Unfortunately, Series.argsort ignores the order kwarg. – Iain Dillingham 4 mins ago

Quick and Dirty

Make the counts negative

df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]

Or

i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]

How to sort a pandas data frame by value counts of a column?

Question

1 answers

solution1
6 ACCPTED 2018-08-13 19:41:23

`transform` and `argsort`

`factorize` , `bincount` , and `argsort`

Response to Comments

Quick and Dirty

How to sort a pandas data frame by value counts of a column?

Question

1 answers

solution1 6 ACCPTED 2018-08-13 19:41:23

transform and argsort

factorize , bincount , and argsort

Response to Comments

Quick and Dirty

solution1
6 ACCPTED 2018-08-13 19:41:23

`transform` and `argsort`

`factorize` , `bincount` , and `argsort`