按列总和对熊猫数据框进行排序

Question

I have a dataframe that looks like this我有一个看起来像这样的数据框

            Australia  Austria    United Kingdom  Vietnam
date                                                    
2020-01-30          9        0                 1       2
2020-01-31          9        9                 4       2

I would like to crate a new dataframe that inclues countries that have sum of their column > 4 and I do it我想创建一个新的数据框，其中包含列总和 > 4 的国家，我这样做了

df1 = df[[i for i in df.columns if int(df[i].sum()) > 4]]

this gives me这给了我

            Australia  Austria    United Kingdom  
date                                                     
2020-01-30          9        0                 1      
2020-01-31          9        9                 4

I now would like to sort the countries based on the sum of their column and than take the first 2我现在想根据列的总和对国家进行排序，而不是取前 2 个

            Australia  Austria   
date                                    
2020-01-30          9        0        
2020-01-31          9        9

I know I have to use sort_values and tail .我知道我必须使用sort_values和tail 。 I just can't workout how我就是不能锻炼怎么办

Answer 1

IIUC, you can do: IIUC，你可以这样做：

s = df.sum()
df[s.sort_values(ascending=False).index[:2]]

Output:输出：

            Australia  Austria
date                          
2020-01-30          9        0
2020-01-31          9        9

Answer 2

First filter for sum greater like 4 and then add Series.nlargest for top2 sum and filter by index values:首先过滤总和大于4 ，然后为 top2 总和添加Series.nlargest并按索引值过滤：

s = df.sum()

df = df[s[s > 4].nlargest(2).index]
print (df)
            Australia  Austria
date                          
2020-01-30          9        0
2020-01-31          9        9

Details :详情：

print (s)
Australia    18.0
Austria       9.0
United        5.0
Kingdom       4.0
Vietnam       0.0
dtype: float64

print (s[s > 4])
Australia    18.0
Austria       9.0
United        5.0
dtype: float64

print (s[s > 4].nlargest(2))
Australia    18.0
Austria       9.0
dtype: float64

print (s[s > 4].nlargest(2).index)
Index(['Australia', 'Austria'], dtype='object')

Answer 3

You can take the sum of the dataframe along the first axis, sort_values and take the first n columns:您可以沿第一个轴sort_values取数据sort_values的sum ，并取前n列：

df[df.sum(0).sort_values(ascending=False)[:2].index]


               Australia  Austria
2020-01-30          9        0
2020-01-31          9        9

Answer 4

another way modifying your list comp slightly.另一种稍微修改您的列表组合的方法。

cols = df[[i for i in df.columns if int(df[i].sum()) > 4]].stack().groupby(level=1).sum().head(2).index

#would yield the same result df.stack().groupby(level=1).sum().head(2).index


df[cols]

            Australia  Austria
date                          
2020-01-30          9        0
2020-01-31          9        9

Answer 5

You can also do this inline using the .pipe function, which helps if you don't want to define a variable for a temporary result:您还可以使用.pipe函数内联执行此操作，如果您不想为临时结果定义变量，这会.pipe帮助：

df.pipe(lambda df: df.loc[:, df.sum().sort_values(ascending=False).index])

For example, you might have a pipeline:例如，您可能有一个管道：

new_df = (
    df1
    # Some example operations one might do:
    .groupby('column')
    .apply(sum).unstack()
    .fillna(0).astype(int)
    # Sort columns by total count:
    .pipe(lambda df: df.loc[:, df.sum().sort_values(ascending=False).index])
)

按列总和对熊猫数据框进行排序

问题描述

5 个解决方案

解决方案1
5 已采纳 2020-03-19 13:58:19

解决方案2
4 2020-03-19 13:58:08

解决方案3
3 2020-03-19 13:58:52

解决方案4
1 2020-03-19 14:00:36

解决方案5
1 2021-01-15 22:47:58

按列总和对熊猫数据框进行排序

问题描述

5 个解决方案

解决方案1 5 已采纳 2020-03-19 13:58:19

解决方案2 4 2020-03-19 13:58:08

解决方案3 3 2020-03-19 13:58:52

解决方案4 1 2020-03-19 14:00:36

解决方案5 1 2021-01-15 22:47:58

解决方案1
5 已采纳 2020-03-19 13:58:19

解决方案2
4 2020-03-19 13:58:08

解决方案3
3 2020-03-19 13:58:52

解决方案4
1 2020-03-19 14:00:36

解决方案5
1 2021-01-15 22:47:58