[英]Only get duplicated values within groups with pandas
I have a data frame such as : 我有一个数据框,如:
groups ids numbers
group3 id4 89
group1 id1 50
group1 id1 30
group1 id2 90
group2 id4 89
group2 id6 76
group3 id4 90
and the idea it to find with groupby
groups the duplicated ids and get a new data frame with only duplicated ids by groups such as: 并且它想要通过groupby
组找到重复的id并获得一个新的数据框,只有按组重复的ID,例如:
group1 id1 50
group1 id1 30
group3 id4 89
group3 id4 90
I tried: 我试过了:
for groups in df.groupby('groups'):
print(df['ids'].duplicated)
Thanks for your help. 谢谢你的帮助。
Function groupby
is not necessary, for better performance use DataFrame.duplicated
by multiple columns and parameter keep=False
for get all dupes, then filter by boolean indexing
: 函数groupby
不是必需的,为了更好的性能,使用DataFrame.duplicated
由多列和参数keep=False
获取所有dupes,然后通过boolean indexing
过滤:
df = df[df.duplicated(['groups','ids'], keep=False)]
print (df)
groups ids numbers
0 group3 id4 89
1 group1 id1 50
2 group1 id1 30
6 group3 id4 90
If sorting necessary add DataFrame.sort_values
with DataFrame.reset_index
for default index: 如果需要排序, DataFrame.sort_values
使用DataFrame.reset_index
为默认索引添加DataFrame.sort_values
:
df = (df[df.duplicated(['groups','ids'], keep=False)]
.sort_values(['groups','ids'])
.reset_index(drop=True))
print (df)
groups ids numbers
0 group1 id1 50
1 group1 id1 30
2 group3 id4 89
3 group3 id4 90
You can use: 您可以使用:
df.groupby('groups').apply(lambda x: \
x[x.duplicated('ids',keep=False)]).reset_index(drop=True)
Output: 输出:
groups ids numbers
0 group1 id1 50
1 group1 id1 30
2 group3 id4 89
3 group3 id4 90
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.