简体   繁体   English

如何合并多索引列数据框

[英]How to merge multiindex column dataframe

I want to merge static data with time varying data. 我想将静态数据与时变数据合并。

First dataframe 第一个数据框

a_columns = pd.MultiIndex.from_product([["A","B","C"],["1","2"]])
a_index = pd.date_range("20100101","20110101",freq="BM")
a = pd.DataFrame(columns=a_columns,index=a_index)#A

Second dataframe 第二个数据框

b_columns = ["3","4","5"]
b_index = ["A","B","C"]
b = pd.DataFrame(columns=b_columns,index=b_index)

How do i join these two? 我如何加入这两个? My desired dataframe has the form as A but with additional columns. 我想要的数据框的格式为A,但带有其他列。

Thanks! 谢谢!

I think you need reshape by stack and then create df by to_frame - for concat need Datetimeindex , so new index was from first value of index of a . 我想你需要通过重塑stack ,然后创建df通过to_frame -为CONCAT需要Datetimeindex ,所以新的指数为指标的第一个值a

Last concat + sort_index : 最后一个concat + sort_index

#added some data - 2
a_columns = pd.MultiIndex.from_product([["A","B","C"],["1","2"]])
a_index = pd.date_range("20100101","20110101",freq="BM")
a = pd.DataFrame(2,columns=a_columns,index=a_index)#A

#added some data - 1
b_columns = ["3","4","5"]
b_index = ["A","B","C"]
b = pd.DataFrame(1,columns=b_columns,index=b_index)

c = b.stack().to_frame(a.index[0]).T
print (c)
            A        B        C      
            3  4  5  3  4  5  3  4  5
2010-01-29  1  1  1  1  1  1  1  1  1

d = pd.concat([a,c], axis=1).sort_index(axis=1)
print (d)
            A                    B                    C                  
            1  2    3    4    5  1  2    3    4    5  1  2    3    4    5
2010-01-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-02-26  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-03-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-04-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-05-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-06-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-07-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-08-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-09-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-10-29  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-11-30  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN
2010-12-31  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN  2  2  NaN  NaN  NaN

Last if need replace NaN s only in added columns by first row: 最后,如果需要仅在第一行的添加列中替换NaN

d[c.columns] = d[c.columns].ffill()
print (d)
            A                    B                    C                  
            1  2    3    4    5  1  2    3    4    5  1  2    3    4    5
2010-01-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-02-26  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-03-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-04-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-05-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-06-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-07-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-08-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-09-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-10-29  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-11-30  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0
2010-12-31  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0  2  2  1.0  1.0  1.0

Similar solution with reindex : reindex类似的解决方案:

c = b.stack().to_frame(a.index[0]).T.reindex(a.index, method='ffill')
print (c)
            A        B        C      
            3  4  5  3  4  5  3  4  5
2010-01-29  1  1  1  1  1  1  1  1  1
2010-02-26  1  1  1  1  1  1  1  1  1
2010-03-31  1  1  1  1  1  1  1  1  1
2010-04-30  1  1  1  1  1  1  1  1  1
2010-05-31  1  1  1  1  1  1  1  1  1
2010-06-30  1  1  1  1  1  1  1  1  1
2010-07-30  1  1  1  1  1  1  1  1  1
2010-08-31  1  1  1  1  1  1  1  1  1
2010-09-30  1  1  1  1  1  1  1  1  1
2010-10-29  1  1  1  1  1  1  1  1  1
2010-11-30  1  1  1  1  1  1  1  1  1
2010-12-31  1  1  1  1  1  1  1  1  1

d = pd.concat([a,c], axis=1).sort_index(axis=1)
print (d)
            A              B              C            
            1  2  3  4  5  1  2  3  4  5  1  2  3  4  5
2010-01-29  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-02-26  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-03-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-04-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-05-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-06-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-07-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-08-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-09-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-10-29  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-11-30  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1
2010-12-31  2  2  1  1  1  2  2  1  1  1  2  2  1  1  1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM