熊猫：分组多列，连接一列，同时添加另一列

Question

If I had the following df:如果我有以下 df：

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        3.0    b      y       h
3        4.0    b      y       j
4        5.0    c      x       k
5        6.0    c      x       l
6        6.0    c      y       p

I want to group by the name and role columns, add up the amount , and also do a concatenation of the desc with a , :我想按name和role列分组，将amount相加，并将desc与,串联：

      amount   name   role    desc
0        1.0    a      x       f
1        2.0    a      y       g
2        7.0    b      y       h,j
4        11.0   c      x       k,l
6        6.0    c      y       p

What would be the correct way of approaching this?解决这个问题的正确方法是什么？

Side question: say if the df was being read from a .csv and it had other unrelated columns, how do I do this calculation and then write to a new .csv along with the other columns (same schema as the one read)?附带问题：假设df是从 .csv 读取的并且它有其他不相关的列，我该如何进行计算，然后将其他列（与读取的架构相同）写入新的 .csv ？

Answer 1

May be not exact dupe but there are a lot of questions related to groupby agg可能不完全是骗局，但有很多与 groupby agg 相关的问题

df.groupby(['name', 'role'], as_index=False)\
.agg({'amount':'sum', 'desc':lambda x: ','.join(x)})


    name    role    amount  desc
0   a       x       1.0     f
1   a       y       2.0     g
2   b       y       7.0     h,j
3   c       x       11.0    k,l
4   c       y       6.0     p

Edit: If there are other columns in the dataframe, you can aggregate them using 'first' or 'last' or if their values are identical, include them in grouping.编辑：如果数据框中还有其他列，您可以使用“第一个”或“最后一个”聚合它们，或者如果它们的值相同，请将它们包含在分组中。

Option1:选项1：

df.groupby(['name', 'role'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x), 'other1':'first', 'other2':'first'})

Option 2:选项 2：

df.groupby(['name', 'role', 'other1', 'other2'], as_index=False).agg({'amount':'sum', 'desc':lambda x: ','.join(x)})

Answer 2

Extending @Vaishali's answer.扩展@Vaishali 的回答。 To handle the remaining columns without having to specify each one you could create a dictionary and have that as the argument for the agg(regate) function.要处理剩余的列而不必指定每一列，您可以创建一个字典并将其作为 agg(regate) 函数的参数。

dict = {}
for col in df:
    if (col == 'column_you_wish_to_merge'):
        dict[col] = ' '.join
    else:
        dict[col] = 'first' # or any other group aggregation operation

df.groupby(['key1', 'key2'], as_index=False).agg(dict)

熊猫：分组多列，连接一列，同时添加另一列

问题描述

2 个解决方案

解决方案1
9 已采纳 2018-09-27 23:51:10

解决方案2
1 2020-07-04 14:11:33

熊猫：分组多列，连接一列，同时添加另一列

问题描述

2 个解决方案

解决方案1 9 已采纳 2018-09-27 23:51:10

解决方案2 1 2020-07-04 14:11:33

解决方案1
9 已采纳 2018-09-27 23:51:10

解决方案2
1 2020-07-04 14:11:33