Python Pandas 分組並沿多列排序

Question

我正在玩 pandas groupby function，但有些事情我無法實現。

我的數據是這樣的：

   data = ({
    'Color1':["Blue", "Red", "Green", "Blue", "Red", "Green", "Blue", "Red", "Green"],
    'Color2':["Purple", "Pink", "Yellow", "Purple", "Pink", "Yellow", "Brown", "White", "Grey"],
    'Value':[20, 20, 20, 25, 25, 25, 5, 55, 30]
})

df = pd.DataFrame(data)

我使用 groupby 進行了一些排序（背后的想法是從較大的數據集中提取一些 top N）

df2 = df.groupby(['Color1'], sort=True).sum()[['Value']].reset_index()
df2 = df2.sort_values(by=['Value'], ascending=False)
print(df2)

顏色 1 值 2 紅色 100 1 綠色 75 0 藍色 50

但我最關心的是如何對添加 Color2 進行分組和排序，同時保留 Color 1 上的排序，即結果如下：

  Color1  Color2  Value
0    Red   White     55
1    Red    Pink     45
2  Green  Yellow     45
3  Green    Grey     30
4   Blue  Purple     45
5   Blue   Brown      5

非常感謝你的幫助

Answer 1

問題是值是字符串，所以sum連接值而不是求和。

需要將列轉換為數字：

df = pd.DataFrame(data)
df['Value'] = df['Value'].astype(int)
df2 = df.groupby(['Color1','Color2'], sort=False)['Value'].sum().reset_index()

df2 = df2.sort_values(by=['Value'], ascending=False)

如果需要按Color1, Color2和Color1中的原始順序排序，請使用有序分類：

vals = df2['Color1'].unique()
df2['Color1'] = pd.Categorical(df2['Color1'], ordered=True, categories=vals)

df2 = df2.sort_values(['Color1','Color2'])
print(df2)

  Color1  Color2  Value
1    Red    Pink     45
4    Red   White     55
3   Blue   Brown      5
0   Blue  Purple     45
5  Green    Grey     30
2  Green  Yellow     45

Answer 2

嘗試：

>>> df.groupby(['Color1', 'Color2']).sum() \
      .sort_values(['Color1', 'Value'], ascending=False).reset_index()

  Color1  Color2  Value
0    Red   White     55
1    Red    Pink     45
2  Green  Yellow     45
3  Green    Grey     30
4   Blue  Purple     45
5   Blue   Brown      5

Python Pandas 分組並沿多列排序

問題描述

2 個解決方案

解決方案1
0 2021-12-13 09:16:14

解決方案2
0 2021-12-13 09:21:39

Python Pandas 分組並沿多列排序

問題描述

2 個解決方案

解決方案1 0 2021-12-13 09:16:14

解決方案2 0 2021-12-13 09:21:39

解決方案1
0 2021-12-13 09:16:14

解決方案2
0 2021-12-13 09:21:39