无法使用drop_duplicates从DataFrame中删除重复项

Question

So I am using DataFrame from Pandas, python. 所以我正在使用Pandas的DataFrame和python。

The dataframe, I will be referring to was created by the following way: 我将参考的数据帧是通过以下方式创建的：

search = DataFrame([[262,'ny', '20'],[515,'paris','19'],[669,'ldn','10'], [669,'ldn', 10],[669,'ldn',5]],columns = ['subscriber_id','location','radius' ])

title = DataFrame([[262,'director'],[515,'artist'],[669,'scientist']],columns = ['subscriber_id','title' ])

Both the title and search DataFrames are then merged. 然后将标题和搜索DataFrame合并。

mergedTable = merge(title, search, on='subscriber_id', how= 'outer')

This forms the dataframe: 形成数据框：

   subscriber_id      title location radius
0            262   director       ny     20
1            515     artist    paris     19
2            669  scientist      ldn     10
3            669  scientist      ldn     10
4            669  scientist      ldn      5

As we can see it has been merged correctly, so we now have data for a subscriber in multiple rows dependent on their searches. 如我们所见，它已正确合并，因此现在我们可以根据用户的搜索在多行中获取订户的数据。

Now I do not want to get rid of the subscribers having multiple rows with different values, but I do want to get rid of duplicate rows. 现在，我不想摆脱具有多个具有不同值的行的订阅者，但是我确实希望摆脱重复的行。

This is the desired final result: 这是期望的最终结果：

   subscriber_id      title location radius
0            262   director       ny     20
1            515     artist    paris     19
2            669  scientist      ldn     10
4            669  scientist      ldn      5

The row 3, a duplicate of row 2, is removed. 第3行与第2行重复，将被删除。

I have been researching this and it seems that drop_duplicates() should work, ie 我一直在研究这个，似乎drop_duplicates（）应该工作，即

mergedTable.drop_duplicates()

But this doesn't work, rows are not removed. 但这不起作用，行也不会删除。 Any tips/solutions available? 有可用的提示/解决方案吗？

Answer 1

Your radius is of dtype object due to some strings within: [669,'ldn','10'] . 由于[669,'ldn','10']某些字符串，您的半径为dtype对象。 And '10' != 10 . 和'10' != 10 。 Converting to integer will do the trick: 转换为整数将达到目的：

>>> mergedTable.radius = mergedTable.radius.astype(int)
>>> mergedTable.drop_duplicates()
   subscriber_id      title location  radius
0            262   director       ny      20
1            515     artist    paris      19
2            669  scientist      ldn      10
4            669  scientist      ldn       5

无法使用drop_duplicates从DataFrame中删除重复项

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-12-02 18:40:44

无法使用drop_duplicates从DataFrame中删除重复项

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-12-02 18:40:44

解决方案1
3 已采纳 2013-12-02 18:40:44