简体   繁体   中英

Merge 2 data frames with multiple conditions, on ID and column names, automatically based on common columns

I have 2 large data frames with thousands of columns for each df. I need to left-join the two tables, namely df1 and df2. However, I don't think I'll be able to manually list all of the common columns/keys between the 2 data frames. Below is an example of the data frames:

df1 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
                'test' : [0,0,0],
                'beautiful' : [0,0,0],
                'crazy' : [0,0,0],
                'word' : [0,0,0]})

  id  test  beautiful  crazy  word
0  a     0          0      0     0
1  b     0          0      0     0
2  c     0          0      0     0

df2 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
                'test' : [1,0,0],
                'autumn' : [0,1,0],
                'fall' : [0,0,1],
                'word' : [1,1,0]})

  id  test  autumn  fall  word
0  a     1       0     0     1
1  b     0       1     0     1
2  c     0       0     1     0

df_result = pd.DataFrame({'id':['a','b','c'],
                          'test' : [1,0,0],
                        'beautiful' : [0,0,0],
                        'crazy' : [0,0,0],
                        'word' : [1,1,0]})

  id  test  beautiful  crazy  word
0  a     1          0      0     1
1  b     0          0      0     1
2  c     0          0      0     0

As you can see from the code, I need to join the two data frames based on two conditions. If id matches and column names match, then transfer the value from df2 to df1. I found this post with a similar problem to mine, but it is left unsolved. Thanks in advance

The DataFrame.update() method should do it.

df1.update(df2)

After running this line, we can test to see if it matches your desired output:

print(df1 == df_result)

Here's the result:

     id  test  beautiful  crazy  word
0  True  True       True   True  True
1  True  True       True   True  True
2  True  True       True   True  True

Additionally, df1 itself (.update works in place):

    id  test    beautiful   crazy   word
0   a   1   0   0   1
1   b   0   0   0   1
2   c   0   0   0   0

You just need to set the index to id and use pandas.DataFrame.update No need to worry about other columns in df2 because:

join{'left'}: default 'left'
Only left join is implemented, keeping the index and columns of the original object.

df1 = df1.set_index('id')
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)

Output df1 :

  id  test  beautiful  crazy  word
0  a     1          0      0     1
1  b     0          0      0     1
2  c     0          0      0     0

try this one

# Intersections

df1_columns = set(df1.columns)
df2_columns = set(df2.columns)
intersection = list(df1_columns.intersection(df2_columns))

#Merging based on similar columns

df_merge = df1.merge(df2, how = 'left', on = intersection)

I need to clarify, based on similar columns between df1 & df2, df1 stays unchanged and values transfer from df2 to df1 (based on intersected columns)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM