簡體   English   中英

合並兩個具有分層列的數據框

[英]Merging two dataframes with hierarchical columns

這是我第一次在 Pandas 中使用多索引,我需要一些幫助來將兩個數據框與分層列合並。 這是我的兩個數據框:

col_index = pd.MultiIndex.from_product([['a', 'b', 'c'], ['w', 'x']])
df1 = pd.DataFrame(np.ones([4,6]),columns=col_index, index=range(4))

     a         b         c     
     w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0  1.0  1.0

df2 = pd.DataFrame(np.zeros([2,6]),columns=col_index, index=range(2))

     a         b         c     
     w    x    w    x    w    x
0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0  0.0

當我使用合並方法時,我得到以下結果:

pd.merge(df1,df2, how='left', suffixes=('', '_2'), left_index = True, right_index= True ))

     a         b         c       a_2       b_2       c_2     
     w    x    w    x    w    x    w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
1  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
2  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN

但我想在較低級別合並兩個數據幀,后綴在 ['w', 'x'] 上生效,如下所示:

     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

您可以將joinmergeswaplevel()reorder_levels 然后使用.sort_index()並通過axis=1按索引列排序。

  • 當您像這樣對索引進行合並時, .join()會更好。
  • .swaplevel()在有兩個級別時更好(如本例),而.reorder_levels()在 3 個或更多級別時更好。

以下是這些方法的 4 種組合。 對於這個特定的例子,我認為.join() / .swaplevel()是最瘋狂的(見最后一個例子):

df3 = (df1.reorder_levels([1,0],axis=1)
       .join(df2.reorder_levels([1,0],axis=1), rsuffix='_2')
       .reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[1]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.reorder_levels([1,0],axis=1),
                df2.reorder_levels([1,0],axis=1),
                how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
                .reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[2]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.swaplevel(axis=1),
                df2.swaplevel(axis=1),
                how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
                .swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[3]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (df1.swaplevel(i=0,j=1, axis=1)
       .join(df2.swaplevel(axis=1), rsuffix='_2')
       .swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[4]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM