I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts()
.
import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1
The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order.
I was thinking about using df.groupby
, merging df['user_id'].value_counts()
with the data frame, and also converting df['user_id']
to ordered categorical data. However, none of these approaches seemed particularly elegant.
Thanks in advance for any help!
transform
and argsort
Use kind='mergesort'
for stability
df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]
factorize
, bincount
, and argsort
Use kind='mergesort'
for stability
i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]
Thank you @piRSquared. Is it possible to reverse the sort order, though? value_counts is in descending order. In the example, user 2 has 90 rows and user 1 has 10 rows. I'd like user 2's rows to come first. Unfortunately, Series.argsort ignores the order kwarg. – Iain Dillingham 4 mins ago
Make the counts negative
df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]
Or
i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.