Python - Combining two Dataframes based on multiple columns

Question

I know there is a way to do this and know I have done it before but I just can't figure out how and I also don't know how to google it specifically. So I'm sorry if there is a pretty obvious answer to this.

I want to combine DF1 and DF2 in a way that my output is DF3. In short: I want the rows that are in DF2 but not in DF1 to be added to DF1 with the Sales of DF2. And the similarity of the rows shall be compared based on the 'Day', 'Month' and 'Hour' columns.

Two dataframes:

#Dataframe 1:
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     5     18
4   10      7     6     12
5   10      7     7     22

#Dataframe 2:
   Day  Month  Hour  Sales
0   10      7     1      0
1   10      7     2      0
2   10      7     3      0
3   10      7     4      0
4   10      7     5      0
5   10      7     6      0
6   10      7     7      0
7   10      7     8      0

And this is the output I want:

#Dataframe 3:
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     4      0
4   10      7     5     18
5   10      7     6     12
6   10      7     7     22
7   10      7     8      0

There is probably a merge, join or similar that allows me to do this but I can't remember. Any help is much appreciated!

Answer 1

Let us do drop_duplicates after concat

df = pd.concat([df1,df2]).drop_duplicates(['Day','Month','Hour']).sort_values(['Day','Month','Hour'])
Out[19]: 
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     4      0
3   10      7     5     18
4   10      7     6     12
5   10      7     7     22
7   10      7     8      0

merge also work

df = df2.drop('Sales',1).merge(df1,on=['Day','Month','Hour'],how='left').fillna(0)
df
Out[26]: 
   Day  Month  Hour  Sales
0   10      7     1   12.0
1   10      7     2   14.0
2   10      7     3   10.0
3   10      7     4    0.0
4   10      7     5   18.0
5   10      7     6   12.0
6   10      7     7   22.0
7   10      7     8    0.0

Answer 2

create an assist column 'tag' for DF1, DF2, combine with column day, month and hour
DF2 filter tag which is not in DF1's tag
concat DF1 and filtered DF2
delete assist column 'tag'

for DF in [DF1, DF2]:
    DF['tag'] = ( DF.Day.astype(str)   + '-' 
                + DF.Month.astype(str) + '-' 
                + DF.Hour.astype(str)
                )
cond = ~ DF2.tag.isin(DF1.tag)
DF3 = pd.concat([DF1, DF2[cond]], ignore_index=True)
del DF3['tag']

Python - Combining two Dataframes based on multiple columns

Question

2 answers

solution1
2 2020-12-10 00:50:06

solution2
0 2020-12-10 01:40:08

Python - Combining two Dataframes based on multiple columns

Question

2 answers

solution1 2 2020-12-10 00:50:06

solution2 0 2020-12-10 01:40:08

solution1
2 2020-12-10 00:50:06

solution2
0 2020-12-10 01:40:08