简体   繁体   中英

Pandas - Merge two df's on non-unique date (outer join)

I have two df's that I would like to combine in a slightly unusual way.

The df's in question:

df1:
Index      colA 
2012-01-02  1
2012-01-05  2
2012-01-10  3
2012-01-10  4

and then df2:

Index      colB
2012-01-01  6
2012-01-05  7
2012-01-08  8
2012-01-10  9

Output:

Index      colA colB
2012-01-01  NaN   6
2012-01-02  1     NaN
2012-01-05  2     7
2012-01-08  NaN   8
2012-01-10  3     9
2012-01-10  4     NaN
  • Happy to have the NaN output if there is no matching date between the df's.
  • If there is a matching date I would like to return both columns.
  • There can be an instance where a single date has eg. 20 rows in df1 and 15 rows in df2.. it would match off the first 15 (don't care about ordering) and then return NaN's for the last 5 rows in df2.

When trying to do this myself with pd.merge() and others I can't because the date is obviously not unique for an index.

Any suggestions how to get the intended behavior?

Thanks

You may need create a helper key with cumcount

df1=df1.assign(key=df1.groupby('Index').cumcount())
df2=df2.assign(key=df2.groupby('Index').cumcount())
fdf=df1.merge(df2,how='outer').drop('key',1).sort_values('Index')
fdf
Out[104]: 
        Index  colA  colB
4  2012-01-01   NaN   6.0
0  2012-01-02   1.0   NaN
1  2012-01-05   2.0   7.0
5  2012-01-08   NaN   8.0
2  2012-01-10   3.0   9.0
3  2012-01-10   4.0   NaN

使用join()应该可以

df1.join(df2, how='outer', sort=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM