[英]Pandas drop_duplicates Issue
I wish to remove duplicate entries by specifying a particular column.我希望通过指定特定列来删除重复条目。 The column is labelled 'sent_name'
该列标记为“sent_name”
print(new_df)
sent_name \
0 Abbey Road Station, London, UK
1 Abbey Wood Station, London, UK
2 Acton Station, London, UK
3 Acton Central Station, London, UK
Name Lat Lng \
0 Abbey Road, London E15, UK 51.531930 0.003760
1 Abbey Wood, London SE2, UK 51.491060 0.121420
2 Station Parade, West Acton London Underground ... 51.518055 -0.281053
3 Acton Central, London W3, UK 51.508720 -0.262950
type
0 [u'transit_station', u'point_of_interest', u'e...
1 [u'transit_station', u'point_of_interest', u'e...
2 [u'train_station', u'transit_station', u'point...
3 [u'transit_station', u'point_of_interest', u'e...
I have tried我试过了
new_df.drop_duplicates(["sent_name"])
and和
new_df.drop_duplicates(subset="sent_name")
On inspection, nither of these removes all of the duplicates.在检查时,这些都不会删除所有重复项。
For example,例如,
1038 Woodford Station, London, UK
1040 Woodford Station, London, UK
1041 Woodford Station, London, UK
1043 Woodford Station, London, UK
1044 Woodford Station, London, UK
1038 South Woodford London Underground Station, Geo... 51.591789 0.027315
1040 Woodford, Woodford, Woodford Green, Greater Lo... 51.606900 0.034000
1041 South Woodford, London E18, UK 51.591910 0.027360
1043 South Woodford (Stop C), London E18, UK 51.591312 0.029013
1044 South Woodford (Stop D), London E18, UK 51.592010 0.027658
1038 [u'train_station', u'transit_station', u'point...
1040 [u'transit_station', u'point_of_interest', u'e...
1041 [u'transit_station', u'point_of_interest', u'e...
1043 [u'transit_station', u'point_of_interest', u'e...
1044 [u'transit_station', u'point_of_interest', u'e...
You need to assign the result of drop_duplicates
as by default inplace=False
and nearly all pandas ops return a copy.默认情况下,您需要将
drop_duplicates
的结果分配为drop_duplicates
inplace=False
并且几乎所有的drop_duplicates
操作都返回一个副本。
So either:所以要么:
new_df = new_df.drop_duplicates(["sent_name"])
or或者
new_df.drop_duplicates(["sent_name"], inplace=True)
will work将工作
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.