[英]Can't remove duplicates from DataFrame with drop_duplicates
So I am using DataFrame from Pandas, python. 所以我正在使用Pandas的DataFrame和python。
The dataframe, I will be referring to was created by the following way: 我将参考的数据帧是通过以下方式创建的:
search = DataFrame([[262,'ny', '20'],[515,'paris','19'],[669,'ldn','10'], [669,'ldn', 10],[669,'ldn',5]],columns = ['subscriber_id','location','radius' ])
title = DataFrame([[262,'director'],[515,'artist'],[669,'scientist']],columns = ['subscriber_id','title' ])
Both the title and search DataFrames are then merged. 然后将标题和搜索DataFrame合并。
mergedTable = merge(title, search, on='subscriber_id', how= 'outer')
This forms the dataframe: 形成数据框:
subscriber_id title location radius
0 262 director ny 20
1 515 artist paris 19
2 669 scientist ldn 10
3 669 scientist ldn 10
4 669 scientist ldn 5
As we can see it has been merged correctly, so we now have data for a subscriber in multiple rows dependent on their searches. 如我们所见,它已正确合并,因此现在我们可以根据用户的搜索在多行中获取订户的数据。
Now I do not want to get rid of the subscribers having multiple rows with different values, but I do want to get rid of duplicate rows. 现在,我不想摆脱具有多个具有不同值的行的订阅者,但是我确实希望摆脱重复的行。
This is the desired final result: 这是期望的最终结果:
subscriber_id title location radius
0 262 director ny 20
1 515 artist paris 19
2 669 scientist ldn 10
4 669 scientist ldn 5
The row 3, a duplicate of row 2, is removed. 第3行与第2行重复,将被删除。
I have been researching this and it seems that drop_duplicates() should work, ie 我一直在研究这个,似乎drop_duplicates()应该工作,即
mergedTable.drop_duplicates()
But this doesn't work, rows are not removed. 但这不起作用,行也不会删除。 Any tips/solutions available?
有可用的提示/解决方案吗?
Your radius is of dtype object due to some strings within: [669,'ldn','10']
. 由于
[669,'ldn','10']
某些字符串,您的半径为dtype对象。 And '10' != 10
. 和
'10' != 10
。 Converting to integer will do the trick: 转换为整数将达到目的:
>>> mergedTable.radius = mergedTable.radius.astype(int)
>>> mergedTable.drop_duplicates()
subscriber_id title location radius
0 262 director ny 20
1 515 artist paris 19
2 669 scientist ldn 10
4 669 scientist ldn 5
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.