简体   繁体   中英

Compare two data-frames with different column names and update first data-frame with the column from second data-frame

I am working on two data-frames which have different column names and dimensions.

First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df

Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched

Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1

df1

name
ab
alex
bob
robert
bill

df2

0 1 2 3
abram ab
robert rob bob robbie
alexander alex al
william bill

result_df

name matched_name
ab abram
alex alexander
bob robert
robert robert
bill william

The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:

''' result_df = process_name(df1, df2)

def process_name(df1, df2):

for elem in df2.values:
    
    if elem in df1['name']:
        df1["matched_name"] = df2[0]

'''

Try via concat() , merge() , drop() and rename() and reset_index() method:

df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
    .drop(['1','2','3'],1)
    .rename(columns={'0':'matched_name'})
    .reset_index(drop=True))

Output of df :

    name    matched_name
0   robert  robert
1   ab      abram
2   alex    alexander
3   bill    william
4   bob     robert

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM