简体   繁体   English

使用 python 在 csv 文件中添加列时删除重复项

[英]remove duplicates while adding a column in csv file using python

I have a CSV file that looks like this:我有一个看起来像这样的 CSV 文件:

|innings |     bowler    |
|--------|---------------|                      
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      P Kumar  |
|1       |      Z Khan   |
|1       |      Z Khan   |
|1       |      Z Khan   |
|2       |      AB Dinda |
|2       |      AB Dinda |
|2       |      I Sharma |

Desired Output所需 Output

|innings |     bowler           |
|--------|----------------------|
|1       |    P Kumar,Z Khan    |
|2       |    AB Dinda,I Sharma |

Code I Applied:我应用的代码:

df.groupby(['innings']).bowler.sum().drop_duplicates(subset="bowler",keep='first',inplace=True)

but for some reason, it is giving me an error TypeError: drop_duplicates() got an unexpected keyword argument 'subset'但由于某种原因,它给了我一个错误 TypeError: drop_duplicates() got an unexpected keyword argument 'subset'

then i tried without subset: drop_duplicates("bowler",keep='first', inplace=True) now i am getting this error TypeError: drop_duplicates() got multiple values for argument 'keep'然后我尝试不使用子集: drop_duplicates("bowler",keep='first', inplace=True) 现在我收到此错误 TypeError: drop_duplicates() got multiple values for argument 'keep'

Use DataFrame.drop_duplicates first by both columns and then aggregate join :两列首先使用DataFrame.drop_duplicates然后聚合join

df = (df.drop_duplicates(subset=["bowler",'innings'])
        .groupby('innings')
        .bowler.agg(','.join)
        .reset_index())

print (df)
   innings             bowler
0        1     P Kumar,Z Khan
1        2  AB Dinda,I Sharma

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM