熊貓：分組多列，連接一列，同時添加另一列

Question

如果我有以下 df：

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        3.0    b      y       h
3        4.0    b      y       j
4        5.0    c      x       k
5        6.0    c      x       l
6        6.0    c      y       p

我想按name和role列分組，將amount相加，並將desc與,串聯：

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        7.0    b      y       h,j
4        11.0   c      x       k,l
6        6.0    c      y       p

解決這個問題的正確方法是什么？

附帶問題：假設df是從 .csv 讀取的並且它有其他不相關的列，我該如何進行計算，然后將其他列（與讀取的架構相同）寫入新的 .csv ？

Answer 1

可能不完全是騙局，但有很多與 groupby agg 相關的問題

df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})


    name    role    amount  desc
0   a       x       1.0     f
1   a       y       2.0     g
2   b       y       7.0     h,j
3   c       x       11.0    k,l
4   c       y       6.0     p

編輯：如果數據框中還有其他列，您可以使用“第一個”或“最后一個”聚合它們，或者如果它們的值相同，請將它們包含在分組中。

選項1：

df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})

選項 2：

df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})

Answer 2

擴展@Vaishali 的回答。 要處理剩余的列而不必指定每一列，您可以創建一個字典並將其作為 agg(regate) 函數的參數。

dict = {}
for col in df:
    if (col == 'column_you_wish_to_merge'):
        dict[col] = ' '.join
    else:
        dict[col] = 'first' # or any other group aggregation operation

df.groupby(['key1', 'key2'], as_index=False).agg(dict)

熊貓：分組多列，連接一列，同時添加另一列

問題描述

2 個解決方案

解決方案1
9 已采納 2018-09-27 23:51:10

解決方案2
1 2020-07-04 14:11:33

熊貓：分組多列，連接一列，同時添加另一列

問題描述

2 個解決方案

解決方案1 9 已采納 2018-09-27 23:51:10

解決方案2 1 2020-07-04 14:11:33

解決方案1
9 已采納 2018-09-27 23:51:10

解決方案2
1 2020-07-04 14:11:33