[英]Pandas: Efficient way to join values in only selected columns in a grouped dataframe
I have a df such that 我有一个这样的df
LAST_MOD_DATE ID TITLE TXT_ID TXT
0 1486047205463 2 TITLE-2 7 ABC
1 1486047205463 2 TITLE-2 5 XYZ
2 1486047205463 2 TITLE-2 6 MNQ
I would like to group it by ID so as to flatten it into a single row. 我想按ID对它进行分组,以便将其压平成一行。 The fields with differing values
TXT_ID
and TXT
will be combined into one with comma separated values. 具有不同值
TXT_ID
和TXT
的字段将以逗号分隔值组合成一个。 So, Something like below: 所以,如下所示:
ID
2 1486047205463 TITLE-2 7, 5, 6 ABC, XYZ, MNQ
I am able to just get a single columns out by 我能够得到一个单一的列
df.groupby('ID')['TXT'].apply(lambda x:', '.join(x))
But how to do it on the entire df so that I can selectively join some columns while just choose the top values of the other columns within the same groups. 但是如何在整个df上执行此操作以便我可以选择性地连接某些列,同时只选择相同组中其他列的顶部值。 Right now I am doing it by aggregating the values as a set and then expanding the set for some columns.
现在我通过将值聚合为一组然后扩展某些列的集合来实现它。 But this doesn't seem very efficient
但这似乎不是很有效
Use agg
and supply what function you want to apply for each column. 使用
agg
并提供要为每列应用的功能。 Here I give you a mixed example where I group only againt 'ID' to illustrate how to take the first element on 'TITLE', but you could group against it to for your sample (which might not be the general case you have: 在这里,我给你一个混合的例子,我只对'ID'进行分组,以说明如何将第一个元素放在'TITLE'上,但你可以将它与你的样本分组(这可能不是你的一般情况:
df.groupby('ID').agg({'TITLE':'first',
'TXT_ID':lambda x:', '.join(x),
'TXT':lambda x:', '.join(x)})
Out[288]:
TITLE TXT_ID TXT
ID
2 TITLE-2 7, 5, 6 ABC, XYZ, MNQ
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.