简体   繁体   中英

Merge two pandas dataframes with timeseries index

I have two pandas dataframes that I would like to merge/join together

For example:

#required packages
import os
import pandas as pd
import numpy as np
import datetime as dt

# create sample time series
dates1 = pd.date_range('1/1/2000', periods=4, freq='10min')
dates2 = dates1
column_names = ['A','B','C']
df1 = pd.DataFrame(np.random.randn(4, 3), index=dates1, 
columns=column_names)
df2 = pd.DataFrame(np.random.randn(4, 3), index=dates2, 
columns=column_names)

df3 = df1.merge(df2, how='outer', left_index=True, right_index=True, suffixes=('_x', '_y'))

From here I would like to merge the two datasets in the following manner(Note the order of columns):

                                              A_x       A_y       B_x       B_y       C_x       C_y
2000-01-01 00:00:00 2000-01-01 00:00:00 -0.572616 -0.867554 -0.382594  1.866238 -0.756318  0.564087
2000-01-01 00:10:00 2000-01-01 00:10:00 -0.814776 -0.458378  1.011491  0.196498 -0.523433 -0.296989
2000-01-01 00:20:00 2000-01-01 00:20:00 -0.617766  0.081141  1.405145 -1.183592  0.400720 -0.872507
2000-01-01 00:30:00 2000-01-01 00:30:00  1.083721  0.137422 -1.013840 -1.610531 -1.258841  0.142301

I would like to preserve both dataframe indexes by either creating a multi-index dataframe or creating a column for the second index. Would it be easier to use merge_ordered instead of merge or join?

Any help is appreciated.

I think you want to concat rather than merge:

In [11]: pd.concat([df1, df2], keys=["df1", "df2"], axis=1)
Out[11]:
                          df1                           df2
                            A         B         C         A         B         C
2000-01-01 00:00:00  1.621737  0.093015 -0.698715  0.319212  1.021829  1.707847
2000-01-01 00:10:00  0.780523 -1.169127 -1.097695 -0.444000  0.170283  1.652005
2000-01-01 00:20:00  1.560046 -0.196604 -1.260149  0.725005 -1.290074  0.606269
2000-01-01 00:30:00 -1.074419 -2.488055 -0.548531 -1.046327  0.895894  0.423743

Using concat

pd.concat([df1.reset_index().add_suffix('_x'),\ 
df2.reset_index().add_suffix('_y')], axis = 1)\
.set_index(['index_x', 'index_y'])

                                         A_x        B_x         C_x         A_y         B_y         C_y
index_x             index_y                     
2000-01-01 00:00:00 2000-01-01 00:00:00 -1.437311   -1.414127   0.344057    -0.533669   -0.260106   -1.316879
2000-01-01 00:10:00 2000-01-01 00:10:00 0.662025    1.860933    -0.485169   -0.825603   -0.973267   -0.760737
2000-01-01 00:20:00 2000-01-01 00:20:00 -0.300213   0.047812    -2.279631   -0.739694   -1.872261   2.281126
2000-01-01 00:30:00 2000-01-01 00:30:00 1.499468    0.633967    -1.067881   0.174793    1.197813    -0.879132

merge will indeed merge both indices.

You can create the extra column in df2 before you merge :

df2["index_2"]=df2.index

Which will create a column in the final result that will be the value of the index in df2 .

Please note that the only case this column will be different from the index is when the element does not appear in df2 , in which case it will be null, so I'm not sure I understand your final goal in this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM