简体   繁体   中英

Merging pandas dataframes based on index and date

I would like to merge/concatenate/... 2 dataframes such that I get the 3rd dataframe below (which is the 1st dataframe + var2 from the 2nd dataframe for each ticker/date combination from the 1st one):

1st dataframe:

 dict1 = [{'date': '2016-11-29','var1': 'x1'},
 { 'date': '2016-11-29','var1': 'x2'},
 { 'date': '2016-11-29','var1': 'x3'},
 {'date': '2016-11-29','var1': 'x4'},
 {'date': '2016-11-30','var1': 'x5'},
 {'date': '2016-11-30','var1': 'x6'}]
 df1 = pd.DataFrame(dict1, index=['ge','jpm','fb', 'msft','ge','jpm'])

2nd dataframe:

 dict2 = [{'date': '2016-11-29','var2': 'y1'},
 { 'date': '2016-11-29','var2': 'y2'},
 { 'date': '2016-11-29','var2': 'y3'},
 {'date': '2016-11-29','var2': 'y4'},
 {'date': '2016-11-30','var2': 'y5'},
 {'date': '2016-11-30','var2': 'y6'},
 {'date': '2016-11-30','var2': 'y7'},
 {'date': '2016-11-30','var2': 'y8'}]
  df2 = pd.DataFrame(dict2, index=['aapl', 'msft','ge','jpm','aapl', 'msft','ge','jpm'])

3rd (target) dataframe:

  dict3 = [{'date': '2016-11-29','var1': 'x1','var2': 'y3'},
 { 'date': '2016-11-29','var1': 'x2','var2': 'y4'},
 { 'date': '2016-11-29','var1': 'x3','var2': 'NaN'},
 {'date': '2016-11-29','var1': 'x4','var2': 'y2'},
 {'date': '2016-11-30','var1': 'x5','var2': 'y7'},
 {'date': '2016-11-30','var1': 'x6','var2': 'y8'}]
 df3 = pd.DataFrame(dict3, index=['ge','jpm','fb', 'msft','ge','jpm'])

Note, that the dataframes are not aligned, so the merging should ensure that the index and the date are identical. That is, index and date are the unique identifiers. For instance in the 3rd dataframe, you can see that the 1st row needs the ticker 'ge' from the date '2016-11-29'. Also, as mentioned, I only need the data that is in df1, anything in df2 beyond that is not interesting (ie additional dates or tickers are not relevant).

You may reset the index, merge on the index column and date column, and restore the index:

df1.reset_index().merge(df2.reset_index(), 
                        on=['index', 'date'], how='left')\
                 .set_index('index')
#             date var1 var2
#index                      
#ge     2016-11-29   x1   y3
#jpm    2016-11-29   x2   y4
#fb     2016-11-29   x3  NaN
#msft   2016-11-29   x4   y2
#ge     2016-11-30   x5   y7
#jpm    2016-11-30   x6   y8

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM