[英]Pandas Group by sum of all the values of the group and another column as comma separated
I want to group by one column (tag) and sum up the corresponding quantites (qty). 我想按一列(标签)分组并总结相应的数量(数量)。 The related reference no.
相关参考号 column should be separated by commas
列应以逗号分隔
import pandas as pd
tag = ['PO_001045M100960','PO_001045M100960','PO_001045MSP2526','PO_001045M870191', 'PO_001045M870191', 'PO_001045M870191']
reference= ['PA_000003', 'PA_000005', 'PA_000001', 'PA_000002', 'PA_000004', 'PA_000009']
qty=[4,2,2,1,1,1]
df = pd.DataFrame({'tag' : tag, 'reference':reference, 'qty':qty})
tag reference qty
PO_001045M100960 PA_000003 4
PO_001045M100960 PA_000005 2
PO_001045MSP2526 PA_000001 2
PO_001045M870191 PA_000002 1
PO_001045M870191 PA_000004 1
PO_001045M870191 PA_000009 1
If I use df.groupby('tag')['qty'].sum().reset_index(), I am getting the following result. 如果我使用df.groupby('tag')['qty']。sum()。reset_index(),则会得到以下结果。
tag qty
ASL_PO_000001045M100960 6
ASL_PO_000001045M870191 3
ASL_PO_000001045MSP2526 2
I need an additional column where the reference no. 我需要在参考编号处增加一列。 are added under the respective tags like,
被添加到各个标签下,例如,
tag qty refrence
ASL_PO_000001045M100960 6 PA_000003, PA_000005
ASL_PO_000001045M870191 3 PA_000002, PA_000004, PA_000009
ASL_PO_000001045MSP2526 2 PA_000001
How can I achieve this? 我该如何实现?
Thanks. 谢谢。
Use pandas.DataFrame.groupby.agg
: 使用
pandas.DataFrame.groupby.agg
:
df.groupby('tag').agg({'qty': 'sum', 'reference': ', '.join})
Output: 输出:
reference qty
tag
PO_001045M100960 PA_000003, PA_000005 6
PO_001045M870191 PA_000002, PA_000004, PA_000009 3
PO_001045MSP2526 PA_000001 2
Note: if reference
column is numeric, ', '.join
will not work. 注意:如果
reference
', '.join
数字,则', '.join
将不起作用。 In such case, use lambda x: ', '.join(str(i) for i in x)
在这种情况下,请使用
lambda x: ', '.join(str(i) for i in x)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.