I have 2 large data frames with thousands of columns for each df. I need to left-join the two tables, namely df1 and df2. However, I don't think I'll be able to manually list all of the common columns/keys between the 2 data frames. Below is an example of the data frames:
df1 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
'test' : [0,0,0],
'beautiful' : [0,0,0],
'crazy' : [0,0,0],
'word' : [0,0,0]})
id test beautiful crazy word
0 a 0 0 0 0
1 b 0 0 0 0
2 c 0 0 0 0
df2 = pd.DataFrame({'id' : ['a', 'b', 'c'], # Define dictionary
'test' : [1,0,0],
'autumn' : [0,1,0],
'fall' : [0,0,1],
'word' : [1,1,0]})
id test autumn fall word
0 a 1 0 0 1
1 b 0 1 0 1
2 c 0 0 1 0
df_result = pd.DataFrame({'id':['a','b','c'],
'test' : [1,0,0],
'beautiful' : [0,0,0],
'crazy' : [0,0,0],
'word' : [1,1,0]})
id test beautiful crazy word
0 a 1 0 0 1
1 b 0 0 0 1
2 c 0 0 0 0
As you can see from the code, I need to join the two data frames based on two conditions. If id
matches and column names
match, then transfer the value from df2 to df1. I found this post with a similar problem to mine, but it is left unsolved. Thanks in advance
The DataFrame.update() method should do it.
df1.update(df2)
After running this line, we can test to see if it matches your desired output:
print(df1 == df_result)
Here's the result:
id test beautiful crazy word
0 True True True True True
1 True True True True True
2 True True True True True
Additionally, df1 itself (.update works in place):
id test beautiful crazy word
0 a 1 0 0 1
1 b 0 0 0 1
2 c 0 0 0 0
You just need to set the index to id
and use pandas.DataFrame.update
No need to worry about other columns in df2
because:
join{'left'}: default 'left'
Only left join is implemented, keeping the index and columns of the original object.
df1 = df1.set_index('id')
df1.update(df2.set_index('id'))
df1.reset_index(inplace=True)
Output df1
:
id test beautiful crazy word
0 a 1 0 0 1
1 b 0 0 0 1
2 c 0 0 0 0
try this one
# Intersections
df1_columns = set(df1.columns)
df2_columns = set(df2.columns)
intersection = list(df1_columns.intersection(df2_columns))
#Merging based on similar columns
df_merge = df1.merge(df2, how = 'left', on = intersection)
I need to clarify, based on similar columns between df1 & df2, df1 stays unchanged and values transfer from df2 to df1 (based on intersected columns)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.