简体   繁体   English

熊猫 drop_duplicates 问题

[英]Pandas drop_duplicates Issue

I wish to remove duplicate entries by specifying a particular column.我希望通过指定特定列来删除重复条目。 The column is labelled 'sent_name'该列标记为“sent_name”

print(new_df)

                                  sent_name  \
0            Abbey Road Station, London, UK   
1            Abbey Wood Station, London, UK   
2                 Acton Station, London, UK   
3         Acton Central Station, London, UK 


                                                Name        Lat       Lng  \
0                            Abbey Road, London E15, UK  51.531930  0.003760   
1                            Abbey Wood, London SE2, UK  51.491060  0.121420   
2     Station Parade, West Acton London Underground ...  51.518055 -0.281053   
3                          Acton Central, London W3, UK  51.508720 -0.262950   

                                                   type  
0     [u'transit_station', u'point_of_interest', u'e...  
1     [u'transit_station', u'point_of_interest', u'e...  
2     [u'train_station', u'transit_station', u'point...  
3     [u'transit_station', u'point_of_interest', u'e... 

I have tried我试过了

new_df.drop_duplicates(["sent_name"])

and

   new_df.drop_duplicates(subset="sent_name")

On inspection, nither of these removes all of the duplicates.在检查时,这些都不会删除所有重复项。

For example,例如,

1038           Woodford Station, London, UK   
1040           Woodford Station, London, UK   
1041           Woodford Station, London, UK   
1043           Woodford Station, London, UK   
1044           Woodford Station, London, UK
1038  South Woodford London Underground Station, Geo...  51.591789  0.027315   
1040  Woodford, Woodford, Woodford Green, Greater Lo...  51.606900  0.034000   
1041                     South Woodford, London E18, UK  51.591910  0.027360   
1043            South Woodford (Stop C), London E18, UK  51.591312  0.029013   
1044            South Woodford (Stop D), London E18, UK  51.592010  0.027658  
1038  [u'train_station', u'transit_station', u'point...  
1040  [u'transit_station', u'point_of_interest', u'e...  
1041  [u'transit_station', u'point_of_interest', u'e...  
1043  [u'transit_station', u'point_of_interest', u'e...  
1044  [u'transit_station', u'point_of_interest', u'e...  

You need to assign the result of drop_duplicates as by default inplace=False and nearly all pandas ops return a copy.默认情况下,您需要将drop_duplicates的结果分配为drop_duplicates inplace=False并且几乎所有的drop_duplicates操作都返回一个副本。

So either:所以要么:

new_df = new_df.drop_duplicates(["sent_name"])

or或者

new_df.drop_duplicates(["sent_name"], inplace=True)

will work将工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM