[英]remove duplicates while adding a column in csv file using python
我有一個看起來像這樣的 CSV 文件:
|innings | bowler |
|--------|---------------|
|1 | P Kumar |
|1 | P Kumar |
|1 | P Kumar |
|1 | P Kumar |
|1 | Z Khan |
|1 | Z Khan |
|1 | Z Khan |
|2 | AB Dinda |
|2 | AB Dinda |
|2 | I Sharma |
所需 Output
|innings | bowler |
|--------|----------------------|
|1 | P Kumar,Z Khan |
|2 | AB Dinda,I Sharma |
我應用的代碼:
df.groupby(['innings']).bowler.sum().drop_duplicates(subset="bowler",keep='first',inplace=True)
但由於某種原因,它給了我一個錯誤 TypeError: drop_duplicates() got an unexpected keyword argument 'subset'
然后我嘗試不使用子集: drop_duplicates("bowler",keep='first', inplace=True) 現在我收到此錯誤 TypeError: drop_duplicates() got multiple values for argument 'keep'
兩列首先使用DataFrame.drop_duplicates
然后聚合join
:
df = (df.drop_duplicates(subset=["bowler",'innings'])
.groupby('innings')
.bowler.agg(','.join)
.reset_index())
print (df)
innings bowler
0 1 P Kumar,Z Khan
1 2 AB Dinda,I Sharma
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.