简体   繁体   中英

Merge based on partial string match in pandas dfs

I have a df that looks like this

first_name last_name
John       Doe
Kelly      Stevens
Dorey      Chang

and another that looks like this

name             email
John Doe         jdoe23@gmail.com
Kelly M Stevens  kelly.stevens@hotmail.com
D Chang          chang79@yahoo.com

To merge these 2 tables, such that the end result is

first_name last_name email
    John   Doe       jdoe23@gmail.com
    Kelly  Stevens   kelly.stevens@hotmail.com
    Dorey  Chang     chang79@yahoo.com

I can't merge on name, but all emails contain each persons last name even if the overall format is different. Is there a way to merge these using only a partial string match?

I've tried things like this with no success:

df1['email']= df2[df2['email'].str.contains(df['last_name'])==True]

IIUC, you can do with merge on the result of an extract:

df1.merge(df2.assign(last_name=df2['name'].str.extract(' (\w+)$'))
             .drop('name', axis=1),
          on='last_name',
          how='left')

Output:

  first_name last_name                      email
0       John       Doe           jdoe23@gmail.com
1      Kelly   Stevens  kelly.stevens@hotmail.com
2      Dorey     Chang          chang79@yahoo.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM