[英]How to compare different columns from two Dataframes in Python
我有 2 个要进行比较的数据框。 请找到下面的信息,并感谢任何帮助。
df 1 显示ID之间的关系
df1 =
IDA IDB Relationship
A100 A200 Parent
A200 A500 Spouse
A111 A112 Child
A112 A111 Parent
df2 包含一个 ID 列表,如果 ID 的第 1 方和第 2 方之间存在任何形式的关系,我将对照 df1 进行检查
df2 =
Sender Receiver
[A900,A200] [A500,A220]
[A150,A100] [A400]
[A400,A112] [A500]
[A700,A112] [A111,A001]
这是我预期的 output 和解释
Output =
Sender Receiver Relationship
[A900,A200] [A500,A220] Spouse #A200 and A500
[A150,A100] [A400] NAN #No match
[A400,A112] [A500] NAN #No match
[A700,A112] [A111,A001] Parent #A112 and A111
我无法测试它,因为您没有提供数据样本,但类似的东西应该可以工作:
Output = df2.copy()
detected_relations = []
for transaction in df2.iterrows:
Receiver = transaction.Receiver
Sender = transaction.Sender
df = df1[(df1.IDA.isin(Sender) & df1.IDB.isin(Receiver)) | (df1.IDB.isin(Sender) & df1.IDA.isin(Receiver))]
detected_relations = detected_relations + df.Relationship
Outpout["Relationship"] = detected_relations
您可以将信息提取到本机 python 数据结构中,然后将其与原始DataFrames
合并回 -
要做到这一点 - 我会首先在 df2 中的Sender
和Receiver
列中配对 -
def make_pairs(row):
senders = row['Sender'].replace("[", "").replace("]", "").split(",")
receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
pairs = [(s, r) for s in senders for r in receivers]
return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()
然后 map 将df1
中的IDA
和IDB
组合成一个字典:
rels = {(ida, idb): rel for ida, idb, rel in df1.values}
然后可以使用 dict 理解(甚至是简单的 for 循环)对感兴趣的值进行子集化
rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}
最后,我们可以将此dict
与df2
合并 -
df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
Sender Receiver relationship
#0 [A900,A200] [A500,A220] Spouse
#1 [A150,A100] [A400] NaN
#2 [A400,A112] [A500] NaN
#3 [A700,A112] [A111,A001] Parent
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.