[英]Python pandas dataframe, check if column value matches other column value for every row
Imagine having the columns username and userid.想象一下有列用户名和用户 ID。
Username UserId
user1 1
user1 1
user2 2
user3 1 <- this is wrong for example
user1 1
User3 has the same userid as user1, which is not supposed to be possible. User3 与 user1 具有相同的用户 ID,这应该是不可能的。 How can I check if any of these occurences exist?如何检查是否存在这些情况?
First remove all duplicates by both columns:首先删除两列的所有重复项:
df1 = df.drop_duplicates(['Username','UserId'])
print (df1)
Username UserId
0 user1 1
2 user2 2
3 user3 1
And then get all duplicates by UserId
- but here still more logic for ditingush if wrong value is for user1
or user3
:然后通过UserId
获取所有重复项 - 但如果user1
或user3
值错误,这里还有更多逻辑用于 ditingush :
dups = df1[df1['UserId'].duplicated(keep=False)]
print (dups)
Username UserId
0 user1 1
3 user3 1
Sample data - added next user4
for I hope better data:示例数据 - 为我希望更好的数据添加了下一个user4
:
print (df)
Username UserId
0 user1 1
1 user1 1
2 user2 2
3 user3 1
4 user4 1
5 user4 1
6 user1 1
One idea is get counts per groups by both columns by GroupBy.transform
:一种想法是通过GroupBy.transform
获取两列的GroupBy.transform
:
df['count'] = df.groupby(['Username','UserId'])['UserId'].transform('size')
print (df)
Username UserId count
0 user1 1 3
1 user1 1 3
2 user2 2 1
3 user3 1 1
4 user4 1 2
5 user4 1 2
6 user1 1 3
Then remove duplicates and sorting by DataFrame.sort_values
:然后删除重复项并按DataFrame.sort_values
排序:
df1 = df.drop_duplicates(['Username','UserId']).sort_values(['UserId','count'])
print (df1)
Username UserId count
3 user3 1 1
4 user4 1 2
0 user1 1 3
2 user2 2 1
Get all dupes:获取所有欺骗:
mask1 = df1['UserId'].duplicated(keep=False)
dups = df1[mask1]
print (dups)
Username UserId count
3 user3 1 1
4 user4 1 2
0 user1 1 3
Remove dupe with maximal count by chain Series.duplicated
with keep='last'
:通过链Series.duplicated
with keep='last'
删除最大计数的欺骗:
dups_without_max_count = df1[df1['UserId'].duplicated(keep='last') & mask1]
print (dups_without_max_count)
Username UserId count
3 user3 1 1
4 user4 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.