简体   繁体   中英

How to merge two columns of different data frame and if the match is found write "True" in a new column using pandas

I'm working on pandas project. I have two data frame similar to bellow

DF1 :

Data1    Data2      Data3
Head     Cat        Fire
Limbs    Dog        Snow
Eyes     Fish       Water
Mouth    Dragon     Air


DF2 :

 Data1     Data2      
 Limbs     Dog        
 Mouth     Dragon        
 Head      Cat 

Based on the above Dataframe I need to compare both DF's and if the match is found I need to write "True" in a separate column else False

ex: lets say, I pick DF2 first row with combination (Limbs, Dog) this should be searched in DF1, as we can see the combination exits in the 2nd row, then write DF1's Data3 value "Snow" to the DF2 Data3 value. and also print "True" value in a new column if the match is found.

expected output

Data1         Data2         Data3   Data4
 Limbs        Dog            Snow    True
 Mouth        Dragon         Air     True
 Head         cat            Fire    True
  Eyes         Fish         Water    False

Currently, I have tried merging two dataframe

Current Code:

df3 = pd.merge(df, valid_req , on=['Data1','Data2' ])

df3


 Data1         Data2         Data3  
     Limbs        Dog            Snow   
     Mouth        Dragon         Air     
     Head         cat            Fire

How can I achieve the expected output?

You can assign a temporary column to df2 and then merge using how='left' :

In [1665]: df2['tmp'] = 1

In [1668]: x = df1.merge(df2, on=['Data1', 'Data2'], how='left')

In [1667]: x
Out[1667]: 
   Data1   Data2  Data3  tmp
0   Head     Cat   Fire  1.0
1  Limbs     Dog   Snow  1.0
2   Eyes    Fish  Water  NaN
3  Mouth  Dragon    Air  1.0

Finally, use numpy.where to assign the new column Data4 based on if x['tmp'] == 1 then True , else False :

In [1668]: import numpy as np

In [1669]: x['Data4'] = np.where(x.tmp.eq(1), True, False)

Drop the unnecessary tmp column using df.drop . Then x is your final output :

In [1671]: x.drop('tmp', 1, inplace=True)

In [1672]: x
Out[1672]: 
   Data1   Data2  Data3  Data4
0   Head     Cat   Fire   True
1  Limbs     Dog   Snow   True
2   Eyes    Fish  Water  False
3  Mouth  Dragon    Air   True

Use DataFrame.merge with left join and indicator=True parameter and then for new column compare by both with DataFrame.pop for remove column:

df = df1.merge(df2, on=['Data1', 'Data2'], how='left', indicator=True)
df['Data4'] = df.pop('_merge').eq('both')
print (df)
   Data1   Data2  Data3  Data4
0   Head     Cat   Fire   True
1  Limbs     Dog   Snow   True
2   Eyes    Fish  Water  False
3  Mouth  Dragon    Air   True

Use simply the apply function on DF1 to create the Data4:

import pandas as pd

DF1 = pd.DataFrame([
    ["Head", "Cat", "Fire"],
    ["Limbs", "Dog", "Snow"],
    ["Eyes", "Fish", "Water"],
    ["Mouth", "Dragon", "Air"]
], columns=["Data1", "Data2", "Data3"])

DF2 = pd.DataFrame([
    ["Limbs", "Dog", "Snow"],
    ["Mouth", "Dragon", "Air"],
    ["Head", "Cat", "Fire"]
], columns=["Data1", "Data2", "Data3"])

DF1["Data4"] = DF1["Data1"].apply(lambda cell: DF2[DF2["Data1"]==cell]["Data1"].count()>0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM