简体   繁体   中英

Merging multiple pandas datasets with non-unique index

I have several similarly structured pandas dataframes stored in a dictionary. I access a dataframe in the following way.

ex_dict[df1]
date        df1price1   df1price2
10-20-2015     100         150
10-21-2015      90         100

I want to merge all of these dataframes into one dataframe by date. The dates are overlapping, but not all dataframes include all dates.

I need to go from this

df1
date        df1price1   df1price2
10-20-2015     100         150
10-21-2015      90         100
10-22-2015     100         140

df2
date        df2price1   df2price2
10-20-2015     110         140
10-21-2015      90         110
10-23-2015     110         120

df3
date        df3price1   df3price2
10-20-2015     100         150
10-22-2015      90         100
10-23-2015      80         130

to this:

df_all
date        df1price1   df1price2 ... df3price1   df3price2
10-20-2015     100         150    ...    100         150
10-21-2015      90         100    ...    NaN         NaN
10-22-2015     100         140    ...     90         100
10-23-2015     NaN         NaN    ...     80         130

I've tried lots of things, but I can't get it to work, short of repeatedly merging 2 at a time to create a new dataframe and then remerging onto that. The number of dataframes I need to merge varies between 4 and 10, so I need a way to do this automatically (hence why I thought a passing a dict might work).

Any help on this would be incredibly appreciated.

You can use a concat followed by a groupby('date') to flatten the result.

In [22]: pd.concat([df1,df2,df3]).groupby('date').max()
Out[22]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-2015        100        150        110        140        100        150
10-21-2015         90        100         90        110        NaN        NaN
10-22-2015        100        140        NaN        NaN         90        100
10-23-2015        NaN        NaN        110        120         80        130

Edit: As BrenBarn points out in the comments, you can use concat(axis=1) if you set the join column as the index of your dataframes:

df1.index = df1.date
df2.index = df2.date
df3.index = df3.date

In [44]: pd.concat([df1,df2,df3],axis=1)
Out[44]:
                  date  df1price1  df1price2        date  df2price1  \
10-20-2015  10-20-2015        100        150  10-20-2015        110
10-21-2015  10-21-2015         90        100  10-21-2015         90
10-22-2015  10-22-2015        100        140         NaN        NaN
10-23-2015         NaN        NaN        NaN  10-23-2015        110

            df2price2        date  df3price1  df3price2
10-20-2015        140  10-20-2015        100        150
10-21-2015        110         NaN        NaN        NaN
10-22-2015        NaN  10-22-2015         90        100
10-23-2015        120  10-23-2015         80        130

You could use multiple merge on date column:

df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')

In [107]: df1.merge(df2, on='date', how='outer').merge(df3, on='date', how='outer').set_index('date')
Out[107]:
            df1price1  df1price2  df2price1  df2price2  df3price1  df3price2
date
10-20-2015        100        150        110        140        100        150
10-21-2015         90        100         90        110        NaN        NaN
10-22-2015        100        140        NaN        NaN         90        100
10-23-2015        NaN        NaN        110        120         80        130

Some explanation: First you merging df1 and df2 on column date with joining outer . The the resulted dataframe you merging with df3 with the same attributes. And finnaly setting index date for your resulted dateframe. If your dataframes have date columns as index you could first do reset_index for each of them and merge on the column name containing date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM