简体   繁体   中英

Python - Combining two Dataframes based on multiple columns

I know there is a way to do this and know I have done it before but I just can't figure out how and I also don't know how to google it specifically. So I'm sorry if there is a pretty obvious answer to this.

I want to combine DF1 and DF2 in a way that my output is DF3. In short: I want the rows that are in DF2 but not in DF1 to be added to DF1 with the Sales of DF2. And the similarity of the rows shall be compared based on the 'Day', 'Month' and 'Hour' columns.

Two dataframes:

#Dataframe 1:
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     5     18
4   10      7     6     12
5   10      7     7     22

#Dataframe 2:
   Day  Month  Hour  Sales
0   10      7     1      0
1   10      7     2      0
2   10      7     3      0
3   10      7     4      0
4   10      7     5      0
5   10      7     6      0
6   10      7     7      0
7   10      7     8      0

And this is the output I want:

#Dataframe 3:
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     4      0
4   10      7     5     18
5   10      7     6     12
6   10      7     7     22
7   10      7     8      0

There is probably a merge, join or similar that allows me to do this but I can't remember. Any help is much appreciated!

Let us do drop_duplicates after concat

df = pd.concat([df1,df2]).drop_duplicates(['Day','Month','Hour']).sort_values(['Day','Month','Hour'])
Out[19]: 
   Day  Month  Hour  Sales
0   10      7     1     12
1   10      7     2     14
2   10      7     3     10
3   10      7     4      0
3   10      7     5     18
4   10      7     6     12
5   10      7     7     22
7   10      7     8      0

merge also work

df = df2.drop('Sales',1).merge(df1,on=['Day','Month','Hour'],how='left').fillna(0)
df
Out[26]: 
   Day  Month  Hour  Sales
0   10      7     1   12.0
1   10      7     2   14.0
2   10      7     3   10.0
3   10      7     4    0.0
4   10      7     5   18.0
5   10      7     6   12.0
6   10      7     7   22.0
7   10      7     8    0.0
  1. create an assist column 'tag' for DF1, DF2, combine with column day, month and hour
  2. DF2 filter tag which is not in DF1's tag
  3. concat DF1 and filtered DF2
  4. delete assist column 'tag'
for DF in [DF1, DF2]:
    DF['tag'] = ( DF.Day.astype(str)   + '-' 
                + DF.Month.astype(str) + '-' 
                + DF.Hour.astype(str)
                )
cond = ~ DF2.tag.isin(DF1.tag)
DF3 = pd.concat([DF1, DF2[cond]], ignore_index=True)
del DF3['tag']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM