[英]Find matching rows based on a conditional grouping in a pandas dataframe
I've look everywhere for this answer but none seem to do what I need. 我到处寻找这个答案,但似乎没有人做我需要的。 Here's a dummy example of what I need: 这是我需要的一个虚拟例子:
data = {'id':[1, 2, 3, 4, 1, 1, 3, 4, 1],
'parent':['a', 'b', 'f', 'j', 'a', 'n', 'f', 'z', 'x'],
'vehicle':['car', 'car', 'truck', 'suv', 'car', 'hatch', 'truck', 'suv', 'car'],
'color':['red', 'blue', 'grey', 'green', 'red', 'purple', 'grey', 'green', 'red'],
'serial': [324234, 23464, 5667, 1245, 786, 34546, 8537, 111111, 8376251537]}
df = pd.DataFrame(data)
df.sort_values(by=['id', 'parent'], inplace=True)
id parent vehicle color serial
0 1 a car red 324234
4 1 a car red 786
5 1 n hatch purple 34546
8 1 x car red 8376251537
1 2 b car blue 23464
2 3 f truck grey 5667
6 3 f truck grey 8537
3 4 j suv green 1245
7 4 z suv green 111111
And what I need is to get all rows where the id is the same but the parent differs and the vehicle and color are the same. 我需要的是获得所有行,其中id是相同的但是父级不同 ,车辆和颜色是相同的。
So I want: 所以我想:
id parent vehicle color serial
0 1 a car red 324234
4 1 a car red 786
8 1 x car red 8376251537
3 4 j suv green 1245
7 4 z suv green 111111
Note that I want to include the top two of the above because they have a different serial number. 请注意,我想要包含上面的前两个,因为它们具有不同的序列号。 Edit: and they are part of a grouping that has differing parent w/ same id. 编辑:它们是具有不同父级w /相同ID的分组的一部分。
I've tried this and get close: 我试过这个并且接近:
target = df[df.duplicated(['id', 'vehicle', 'color'], keep=False)]
id parent vehicle color serial
0 1 a car red 324234
4 1 a car red 786
8 1 x car red 8376251537
2 3 f truck grey 5667
6 3 f truck grey 8537
3 4 j suv green 1245
7 4 z suv green 111111
But I don't want the rows that have matching id, vehicle, color i f the corresponding parent is also the same . 但是我不希望具有匹配id,车辆,颜色i 的相应父级的行也是相同的 。 So in this case, I don't want 所以在这种情况下,我不想要
id parent vehicle color serial
2 3 f truck grey 5667
6 3 f truck grey 8537
because they have the same parent. 因为他们有同一个父母。 I've thought about grouping and changing the index but what I'm doing isn't working. 我已经考虑过分组和更改索引,但我正在做的事情不起作用。 This seems like an easy problem and maybe it is, but I just cant's crack it! 这似乎是一个简单的问题,也许是,但我只是不能破解它!
IIUC, Let's try this: IIUC,让我们试试这个:
df[df.groupby(['id','vehicle','color'])['parent'].transform('nunique') > 1]
Output: 输出:
id parent vehicle color serial
0 1 a car red 324234
4 1 a car red 786
8 1 x car red 8376251537
3 4 j suv green 1245
7 4 z suv green 111111
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.