I know there is a way to do this and know I have done it before but I just can't figure out how and I also don't know how to google it specifically. So I'm sorry if there is a pretty obvious answer to this.
I want to combine DF1 and DF2 in a way that my output is DF3. In short: I want the rows that are in DF2 but not in DF1 to be added to DF1 with the Sales of DF2. And the similarity of the rows shall be compared based on the 'Day', 'Month' and 'Hour' columns.
Two dataframes:
#Dataframe 1:
Day Month Hour Sales
0 10 7 1 12
1 10 7 2 14
2 10 7 3 10
3 10 7 5 18
4 10 7 6 12
5 10 7 7 22
#Dataframe 2:
Day Month Hour Sales
0 10 7 1 0
1 10 7 2 0
2 10 7 3 0
3 10 7 4 0
4 10 7 5 0
5 10 7 6 0
6 10 7 7 0
7 10 7 8 0
And this is the output I want:
#Dataframe 3:
Day Month Hour Sales
0 10 7 1 12
1 10 7 2 14
2 10 7 3 10
3 10 7 4 0
4 10 7 5 18
5 10 7 6 12
6 10 7 7 22
7 10 7 8 0
There is probably a merge, join or similar that allows me to do this but I can't remember. Any help is much appreciated!
Let us do drop_duplicates
after concat
df = pd.concat([df1,df2]).drop_duplicates(['Day','Month','Hour']).sort_values(['Day','Month','Hour'])
Out[19]:
Day Month Hour Sales
0 10 7 1 12
1 10 7 2 14
2 10 7 3 10
3 10 7 4 0
3 10 7 5 18
4 10 7 6 12
5 10 7 7 22
7 10 7 8 0
merge
also work
df = df2.drop('Sales',1).merge(df1,on=['Day','Month','Hour'],how='left').fillna(0)
df
Out[26]:
Day Month Hour Sales
0 10 7 1 12.0
1 10 7 2 14.0
2 10 7 3 10.0
3 10 7 4 0.0
4 10 7 5 18.0
5 10 7 6 12.0
6 10 7 7 22.0
7 10 7 8 0.0
for DF in [DF1, DF2]:
DF['tag'] = ( DF.Day.astype(str) + '-'
+ DF.Month.astype(str) + '-'
+ DF.Hour.astype(str)
)
cond = ~ DF2.tag.isin(DF1.tag)
DF3 = pd.concat([DF1, DF2[cond]], ignore_index=True)
del DF3['tag']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.