[英]Merging two dataframes with hierarchical columns
這是我第一次在 Pandas 中使用多索引,我需要一些幫助來將兩個數據框與分層列合並。 這是我的兩個數據框:
col_index = pd.MultiIndex.from_product([['a', 'b', 'c'], ['w', 'x']])
df1 = pd.DataFrame(np.ones([4,6]),columns=col_index, index=range(4))
a b c
w x w x w x
0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 1.0
df2 = pd.DataFrame(np.zeros([2,6]),columns=col_index, index=range(2))
a b c
w x w x w x
0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0
當我使用合並方法時,我得到以下結果:
pd.merge(df1,df2, how='left', suffixes=('', '_2'), left_index = True, right_index= True ))
a b c a_2 b_2 c_2
w x w x w x w x w x w x
0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1.0 1.0 1.0 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN
3 1.0 1.0 1.0 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN
但我想在較低級別合並兩個數據幀,后綴在 ['w', 'x'] 上生效,如下所示:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
您可以將join
或merge
與swaplevel()
或reorder_levels
。 然后使用.sort_index()
並通過axis=1
按索引列排序。
.join()
會更好。.swaplevel()
在有兩個級別時更好(如本例),而.reorder_levels()
在 3 個或更多級別時更好。 以下是這些方法的 4 種組合。 對於這個特定的例子,我認為.join()
/ .swaplevel()
是最瘋狂的(見最后一個例子):
df3 = (df1.reorder_levels([1,0],axis=1)
.join(df2.reorder_levels([1,0],axis=1), rsuffix='_2')
.reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[1]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (pd.merge(df1.reorder_levels([1,0],axis=1),
df2.reorder_levels([1,0],axis=1),
how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
.reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[2]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (pd.merge(df1.swaplevel(axis=1),
df2.swaplevel(axis=1),
how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
.swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[3]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (df1.swaplevel(i=0,j=1, axis=1)
.join(df2.swaplevel(axis=1), rsuffix='_2')
.swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[4]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.