在 pandas df 中排序前 N 组并分组“其他”

Question

假设我有 df

import pandas as pd

dic = {'001': [14],
       '002': [3],
       '003': [2],
       '004': [6],
       '005': [7],
       '006': [1],
       '007': [2]}
df = pd.DataFrame.from_dict(dic,orient='index')
df.reset_index(inplace=True)
df = df.rename(columns = {'index':'id',0:'count'})
sorted = df.sort_values('count',ascending=False)
print(sorted)

这导致

    id  count
0  001     14
4  005      7
3  004      6
1  002      3
2  003      2
6  007      2
5  006      1

我想按计数列对前 3 名进行排序，并将 rest 分组为“其他”。 我想我想做类似not_top3 = sorted[3:]的事情，但不知道如何从那里将 id 重命名为“其他人”。 完成后，我假设使用groupby和sum来执行 rest。

预期的 output 将是：

    id  count
0  001     14
1  005      7
2  004      6
3  other    8

其中“其他”是剩余 id 的总和。

Answer 1

您可以使用df.append在底部添加一行。

sorted_df = df.sort_values("count", ascending=False)
out = sorted_df.iloc[:3]
out.append(
    {"id": "others", "count": sorted_df["count"].iloc[3:].sum()},
    ignore_index=True,
)

       id  count
0     001     14
1     005      7
2     004      6
3  others      8

Answer 2

您可以创建一个新的id ，其中小于前三个的值被映射为others ，然后聚合以获得新的 dataframe：

(df
.assign(id = np.where(df['count'].isin(df['count'].nlargest(3)), 
                      df['id'], 
                      'other'))
.groupby('id', 
         as_index = False, 
         sort = False)
.sum()
 )

      id  count
0    001     14
1    005      7
2    004      6
3  other      8

在 pandas df 中排序前 N 组并分组“其他”

问题描述

2 个解决方案

解决方案1
3 已采纳 2021-04-17 19:32:57

解决方案2
0 2021-04-17 22:35:16

在 pandas df 中排序前 N 组并分组“其他”

问题描述

2 个解决方案

解决方案1 3 已采纳 2021-04-17 19:32:57

解决方案2 0 2021-04-17 22:35:16

解决方案1
3 已采纳 2021-04-17 19:32:57

解决方案2
0 2021-04-17 22:35:16