[英]How to create column with values from column of duplicated rows separated by commas in DataFrame in Python Pandas?
I have Pandas DataFrame like below (data types of "ID" and "COL1" is "object"):我有 Pandas DataFrame 如下所示(“ID”和“COL1”的数据类型是“对象”):
ID | COL1 | COL2 | COL3
----|------|------|----
123 | ABc | 55 | G4
123 | Abc | 55 | G4
123 | DD | 55 | G4
44 | RoR | 41 | P0
44 | RoR | 41 | P0
55 | XX | 456 | RR
And I need to:我需要:
So as a result I need something like below:因此,我需要以下内容:
ID | COL1_cum | COL1_num |COL2 | COL3
----|----------|----------|-----|-----
123 | ABc, DD | 2 | 55 | G4
44 | RoR | 1 | 41 | P0
55 | XX | 1 | 456 | RR
Explanation for COL1_num: COL1_num 的解释:
How can I do that in Python Pandas?如何在 Python Pandas 中做到这一点?
If there are 2 columns in input data use DataFrame.drop_duplicates
with aggregate join
:如果输入数据中有 2 列,请使用DataFrame.drop_duplicates
和聚合join
:
df1 = df.drop_duplicates().groupby('ID')['COL1'].agg(','.join).reset_index(name='COL1_cum')
If possible multiple columns is possible specify them:如果可能,可以指定多个列:
df1 = (df.drop_duplicates(['ID','COL1'])
.groupby('ID')['COL1']
.agg(','.join)
.reset_index(name='COL1_cum'))
EDIT:编辑:
First remove duplciates per all columns:首先删除所有列的重复项:
df1 = df.drop_duplicates()
print (df1)
ID COL1 COL2 COL3
0 123 ABc 55 G4
2 123 DD 55 G4
3 44 RoR 41 P0
5 55 XX 456 RR
Then aggregate join
, size
and get first values per another columns (because same values per groups ID
):然后聚合join
, size
并获取每个其他列的第一个值(因为每个组ID
的值相同):
df2 = (df1.groupby('ID', sort=False, as_index=False)
.agg(COL1_cum =('COL1',','.join),
COL1_num=('COL1','size'),
COL2=('COL2','first'),
COL3=('COL3','first')))
print (df2)
ID COL1_cum COL1_num COL2 COL3
0 123 ABc,DD 2 55 G4
1 44 RoR 1 41 P0
2 55 XX 1 456 RR
EDIT2: Real data are not duplicated by all columns, possible solution: EDIT2:真实数据不会被所有列复制,可能的解决方案:
df2 = (df.groupby('ID', sort=False, as_index=False)
.agg(COL1_cum =('COL1',lambda x: ','.join(dict.fromkeys(x))),
COL1_num=('COL1','nunique'),
COL2=('COL2','first'),
COL3=('COL3','first')))
print (df2)
ID COL1_cum COL1_num COL2 COL3
0 123 ABc,DD 2 55 G4
1 44 RoR 1 41 P0
2 55 XX 1 456 RR
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.