简体   繁体   中英

How to sort a pandas data frame by value counts of a column?

I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts() .

import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1

The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order.

I was thinking about using df.groupby , merging df['user_id'].value_counts() with the data frame, and also converting df['user_id'] to ordered categorical data. However, none of these approaches seemed particularly elegant.

Thanks in advance for any help!

transform and argsort

Use kind='mergesort' for stability

df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]

factorize , bincount , and argsort

Use kind='mergesort' for stability

i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]

Response to Comments

Thank you @piRSquared. Is it possible to reverse the sort order, though? value_counts is in descending order. In the example, user 2 has 90 rows and user 1 has 10 rows. I'd like user 2's rows to come first. Unfortunately, Series.argsort ignores the order kwarg. – Iain Dillingham 4 mins ago

Quick and Dirty

Make the counts negative

df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]

Or

i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM