繁体   English   中英

如何比较 Python 中两个数据帧的不同列

[英]How to compare different columns from two Dataframes in Python

我有 2 个要进行比较的数据框。 请找到下面的信息,并感谢任何帮助。

df 1 显示ID之间的关系

df1 = 

IDA   IDB   Relationship
A100  A200   Parent
A200  A500   Spouse
A111  A112   Child
A112  A111   Parent

df2 包含一个 ID 列表,如果 ID 的第 1 方和第 2 方之间存在任何形式的关系,我将对照 df1 进行检查

df2 = 

Sender      Receiver
[A900,A200] [A500,A220]
[A150,A100] [A400]
[A400,A112] [A500]
[A700,A112] [A111,A001]

这是我预期的 output 和解释

Output =
 
Sender      Receiver     Relationship
[A900,A200] [A500,A220]  Spouse         #A200 and A500 
[A150,A100] [A400]       NAN            #No match
[A400,A112] [A500]       NAN            #No match
[A700,A112] [A111,A001]  Parent         #A112 and A111

我无法测试它,因为您没有提供数据样本,但类似的东西应该可以工作:

Output = df2.copy()
detected_relations = []

for transaction in df2.iterrows:
    Receiver = transaction.Receiver
    Sender = transaction.Sender
    
    df = df1[(df1.IDA.isin(Sender) & df1.IDB.isin(Receiver)) | (df1.IDB.isin(Sender) & df1.IDA.isin(Receiver))]
    
    detected_relations = detected_relations + df.Relationship
    
Outpout["Relationship"] = detected_relations

您可以将信息提取到本机 python 数据结构中,然后将其与原始DataFrames合并回 -

要做到这一点 - 我会首先在 df2 中的SenderReceiver列中配对 -

def make_pairs(row):
    senders = row['Sender'].replace("[", "").replace("]", "").split(",")
    receivers = row['Receiver'].replace("[", "").replace("]", "").split(",")
    pairs = [(s, r) for s in senders for r in receivers]
    return pairs
send_receive_combinations = df2.apply(make_pairs, axis=1).to_dict()

然后 map 将df1中的IDAIDB组合成一个字典:

rels = {(ida, idb): rel for ida, idb, rel in df1.values}

然后可以使用 dict 理解(甚至是简单的 for 循环)对感兴趣的值进行子集化

rel_pairs = {key: rels[pair] for key, combination in send_receive_combinations.items() for pair in combination if pair in rels}

最后,我们可以将此dictdf2合并 -

df2['relationship'] = df2.index
df2['relationship'] = df2['relationship'].map(rel_pairs)
print(df2)
    Sender     Receiver relationship
#0  [A900,A200]  [A500,A220]       Spouse
#1  [A150,A100]       [A400]          NaN
#2  [A400,A112]       [A500]          NaN
#3  [A700,A112]  [A111,A001]       Parent

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM