Pandas：仅在分组数据框中的选定列中连接值的有效方法

Question

I have a df such that 我有一个这样的df

   LAST_MOD_DATE       ID    TITLE          TXT_ID             TXT  
0  1486047205463        2    TITLE-2        7                  ABC   
1  1486047205463        2    TITLE-2        5                  XYZ   
2  1486047205463        2    TITLE-2        6                  MNQ

I would like to group it by ID so as to flatten it into a single row. 我想按ID对它进行分组，以便将其压平成一行。 The fields with differing values TXT_ID and TXT will be combined into one with comma separated values. 具有不同值TXT_ID和TXT的字段将以逗号分隔值组合成一个。 So, Something like below: 所以，如下所示：

ID 
2  1486047205463     TITLE-2        7, 5, 6          ABC, XYZ, MNQ

I am able to just get a single columns out by 我能够得到一个单一的列

df.groupby('ID')['TXT'].apply(lambda x:', '.join(x))

But how to do it on the entire df so that I can selectively join some columns while just choose the top values of the other columns within the same groups. 但是如何在整个df上执行此操作以便我可以选择性地连接某些列，同时只选择相同组中其他列的顶部值。 Right now I am doing it by aggregating the values as a set and then expanding the set for some columns. 现在我通过将值聚合为一组然后扩展某些列的集合来实现它。 But this doesn't seem very efficient 但这似乎不是很有效

Answer 1

Use agg and supply what function you want to apply for each column. 使用agg并提供要为每列应用的功能。 Here I give you a mixed example where I group only againt 'ID' to illustrate how to take the first element on 'TITLE', but you could group against it to for your sample (which might not be the general case you have: 在这里，我给你一个混合的例子，我只对'ID'进行分组，以说明如何将第一个元素放在'TITLE'上，但你可以将它与你的样本分组（这可能不是你的一般情况：

df.groupby('ID').agg({'TITLE':'first', 
                      'TXT_ID':lambda x:', '.join(x),
                      'TXT':lambda x:', '.join(x)})
Out[288]: 
      TITLE   TXT_ID            TXT
ID                                 
2   TITLE-2  7, 5, 6  ABC, XYZ, MNQ

Pandas：仅在分组数据框中的选定列中连接值的有效方法

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-02-03 20:16:35

Pandas：仅在分组数据框中的选定列中连接值的有效方法

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-02-03 20:16:35

解决方案1
0 已采纳 2017-02-03 20:16:35