简体   繁体   English

将熊猫数据框与带日期的多索引合并

[英]Merging Pandas Dataframe to Multiindex with Date

I have some dataframes with a date index from multiple sources which I want to merge into a single multiindex dataframe. 我有一些来自多个来源的带有日期索引的数据框,我想合并到一个多索引数据框中。 I'm struggling to figure out how to do this. 我正在努力弄清楚该如何做。

Starting with two dataframes: 从两个数据帧开始:

Source 1 来源1

+---------------------+------+------+-----+-------+
|        date         | open | high | low | close |
+---------------------+------+------+-----+-------+
| 2018-04-04 20:00:00 | xxx  | xxx  | xxx | xxx   |
| 2018-04-04 21:00:00 | xxx  | xxx  | xxx | xxx   |
| 2018-04-04 22:00:00 | xxx  | xxx  | xxx | xxx   |
+---------------------+------+------+-----+-------+

Source 2 来源2

+---------------------+------+------+-----+-------+
|        date         | open | high | low | close |
+---------------------+------+------+-----+-------+
| 2018-04-04 20:00:00 | xxx  | xxx  | xxx | xxx   |
| 2018-04-04 21:00:00 | xxx  | xxx  | xxx | xxx   |
| 2018-04-04 22:00:00 | xxx  | xxx  | xxx | xxx   |
+---------------------+------+------+-----+-------+

I'd like to merge them so they are multiindexed on the date with the source1 or source2. 我想合并它们,以便它们在日期与source1或source2上建立多索引。

Something like: 就像是:

+---------------------+---------+------+-----+-------+
|                     |         |      |     |       |
+---------------------+---------+------+-----+-------+
| 2018-04-04 20:00:00 | source1 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
|                     | source2 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
| 2018-04-04 21:00:00 | source1 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
|                     | source2 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
| 2018-04-04 22:00:00 | source1 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
|                     | source2 |      |     |       |
|                     | open    | high | low | close |
|                     | xxx     | xxx  | xxx | xxx   |
+---------------------+---------+------+-----+-------+

Can anyone help? 有人可以帮忙吗?

Thanks! 谢谢!

You can go for concat specifying the keys ie 您可以为concat指定密钥,例如

df3 = pd.concat([df1,df2],keys=['source1','source2']).reset_index(level=0)

df3 = df3.set_index(['date','level_0']).sort_index(level='date')



                                open    high    low    close
 date                 level_0                                
 2018-04-04 20:00:00  source1   xxx     xxx     xxx    xxx   
                      source2   xxx     xxx     xxx    xxx   
 2018-04-04 21:00:00  source1   xxx     xxx     xxx    xxx   
                      source2   xxx     xxx     xxx    xxx   
 2018-04-04 22:00:00  source1   xxx     xxx     xxx    xxx   
                      source2   xxx     xxx     xxx    xxx   

Use concat with keys and set_index for DatetimeIndex and then swaplevel with sort_index : concatkeysset_index用于DatetimeIndex ,然后将swaplevelsort_index swaplevel使用:

df = (pd.concat([df1.set_index('date'),df2.set_index('date')], keys=['source1','source2'])
        .swaplevel(0,1)
        .sort_index())
print (df)
                            open high  low close
date                                            
2018-04-04 20:00:00 source1  xxx  xxx  xxx   xxx
                    source2  xxx  xxx  xxx   xxx
2018-04-04 21:00:00 source1  xxx  xxx  xxx   xxx
                    source2  xxx  xxx  xxx   xxx
2018-04-04 22:00:00 source1  xxx  xxx  xxx   xxx
                    source2  xxx  xxx  xxx   xxx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM