简体   繁体   English

如何通过一列的值计数对熊猫数据框进行排序?

[英]How to sort a pandas data frame by value counts of a column?

I'd like to sort the following pandas data frame by the result of df['user_id'].value_counts() . 我想根据df['user_id'].value_counts()的结果对以下熊猫数据框进行排序。

import pandas as pd
n = 100
df = pd.DataFrame(index=pd.Index(range(1, n+1), name='gridimage_id'))
df['user_id'] = 2
df['has_term'] = True
df.iloc[:10, 0] = 1

The sort should be stable, meaning that whilst user 2's rows would come before user 1's rows, the user 2's rows and user 1's rows would be in the original order. 排序应该是稳定的,这意味着虽然用户2的行将排在用户1的行之前,但用户2的行和用户1的行将保持原始顺序。

I was thinking about using df.groupby , merging df['user_id'].value_counts() with the data frame, and also converting df['user_id'] to ordered categorical data. 我正在考虑使用df.groupby ,将df['user_id'].value_counts()与数据框合并,还将df['user_id']转换为有序的分类数据。 However, none of these approaches seemed particularly elegant. 但是,这些方法似乎都不是特别优雅。

Thanks in advance for any help! 在此先感谢您的帮助!

transform and argsort transformargsort

Use kind='mergesort' for stability 使用kind='mergesort'来保持稳定性

df.iloc[df.groupby('user_id').user_id.transform('size').argsort(kind='mergesort')]

factorize , bincount , and argsort factorizebincountargsort

Use kind='mergesort' for stability 使用kind='mergesort'来保持稳定性

i, r = pd.factorize(df['user_id'])
a = np.argsort(np.bincount(i)[i], kind='mergesort')
df.iloc[a]

Response to Comments 对评论的回应

Thank you @piRSquared. 谢谢@piRSquared。 Is it possible to reverse the sort order, though? 但是,可以颠倒排序顺序吗? value_counts is in descending order. value_counts降序排列。 In the example, user 2 has 90 rows and user 1 has 10 rows. 在该示例中,用户2有90行,而用户1有10行。 I'd like user 2's rows to come first. 我希望用户2的行排在第一位。 Unfortunately, Series.argsort ignores the order kwarg. 不幸的是,Series.argsort忽略了kwarg顺序。 – Iain Dillingham 4 mins ago –伊恩·迪林汉姆4分钟前

Quick and Dirty 又快又脏

Make the counts negative 使计数为负

df.iloc[df.groupby('user_id').user_id.transform('size').mul(-1).argsort(kind='mergesort')]

Or 要么

i, r = pd.factorize(df['user_id'])
a = np.argsort(-np.bincount(i)[i], kind='mergesort')
df.iloc[a]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM