[英]Pandas self-join on non-unique values
I have the following table: 我有下表:
ind_ID pair_ID orig_data
0 A 1 W
1 B 1 X
2 C 2 Y
3 D 2 Z
4 A 3 W
5 C 3 X
6 B 4 Y
7 D 4 Z
Each row has an individual_ID
, and a pair_ID
that it shares with exactly one other row. 每行都有一个
individual_ID
,以及一个与另一行完全共享的pair_ID
。 I want to do a self join, so that every row has its original data, and the data of the row it shares a pair_ID
with: 我想做一个自连接,以便每一行都有它的原始数据,并且它共享一对
pair_ID
的行的数据:
ind_ID pair_ID orig_data partner_data
0 A 1 W X
1 B 1 X W
2 C 2 Y Z
3 D 2 Z Y
4 A 3 W X
5 C 3 X W
6 B 4 Y Z
7 D 4 Z Y
I have tried: 我努力了:
df.join(df, on='pair_ID')
But obviously since pair_ID
values are not unique I get: 但很明显,因为
pair_ID
值不是唯一的,我得到:
ind_ID pair_ID orig_data partner_data
0 A 1 W NaN
1 B 1 X NaN
2 C 2 Y NaN
3 D 2 Z NaN
4 A 3 W NaN
5 C 3 X NaN
6 B 4 Y NaN
7 D 4 Z NaN
I've also thought about creating a new column that concatenates ind_ID+pair_ID
which would be unique, but then the join would not know what to match on. 我还考虑过创建一个连接
ind_ID+pair_ID
的新列,这个列是唯一的,但是连接不会知道要匹配什么。
Is it possible to do a self-join on pair_ID
where each row is joined with the matching row that is not itself? 是否可以在
pair_ID
上进行自pair_ID
,其中每一行都与匹配的行本身连接?
In your case (with only two pairs) - you can probably just groupby and transform based on the ID, and just reverse the order of the values in the group, eg: 在你的情况下(只有两对) - 你可能只是基于ID进行分组和变换,只需反转组中值的顺序,例如:
df.loc[:, 'partner_data'] = df.groupby('pair_ID').orig_data.transform(lambda L: L[::-1])
Which gives you: 哪个给你:
ind_ID pair_ID orig_data partner_ID
0 A 1 W X
1 B 1 X W
2 C 2 Y Z
3 D 2 Z Y
4 A 3 W X
5 C 3 X W
6 B 4 Y Z
7 D 4 Z Y
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.