[英]remove duplicates while adding a column in csv file using python
I have a CSV file that looks like this:我有一个看起来像这样的 CSV 文件:
|innings | bowler |
|--------|---------------|
|1 | P Kumar |
|1 | P Kumar |
|1 | P Kumar |
|1 | P Kumar |
|1 | Z Khan |
|1 | Z Khan |
|1 | Z Khan |
|2 | AB Dinda |
|2 | AB Dinda |
|2 | I Sharma |
Desired Output所需 Output
|innings | bowler |
|--------|----------------------|
|1 | P Kumar,Z Khan |
|2 | AB Dinda,I Sharma |
Code I Applied:我应用的代码:
df.groupby(['innings']).bowler.sum().drop_duplicates(subset="bowler",keep='first',inplace=True)
but for some reason, it is giving me an error TypeError: drop_duplicates() got an unexpected keyword argument 'subset'但由于某种原因,它给了我一个错误 TypeError: drop_duplicates() got an unexpected keyword argument 'subset'
then i tried without subset: drop_duplicates("bowler",keep='first', inplace=True) now i am getting this error TypeError: drop_duplicates() got multiple values for argument 'keep'然后我尝试不使用子集: drop_duplicates("bowler",keep='first', inplace=True) 现在我收到此错误 TypeError: drop_duplicates() got multiple values for argument 'keep'
Use DataFrame.drop_duplicates
first by both columns and then aggregate join
:两列首先使用
DataFrame.drop_duplicates
然后聚合join
:
df = (df.drop_duplicates(subset=["bowler",'innings'])
.groupby('innings')
.bowler.agg(','.join)
.reset_index())
print (df)
innings bowler
0 1 P Kumar,Z Khan
1 2 AB Dinda,I Sharma
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.