[英]Merging two dataframes with hierarchical columns
这是我第一次在 Pandas 中使用多索引,我需要一些帮助来将两个数据框与分层列合并。 这是我的两个数据框:
col_index = pd.MultiIndex.from_product([['a', 'b', 'c'], ['w', 'x']])
df1 = pd.DataFrame(np.ones([4,6]),columns=col_index, index=range(4))
a b c
w x w x w x
0 1.0 1.0 1.0 1.0 1.0 1.0
1 1.0 1.0 1.0 1.0 1.0 1.0
2 1.0 1.0 1.0 1.0 1.0 1.0
3 1.0 1.0 1.0 1.0 1.0 1.0
df2 = pd.DataFrame(np.zeros([2,6]),columns=col_index, index=range(2))
a b c
w x w x w x
0 0.0 0.0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0 0.0 0.0
当我使用合并方法时,我得到以下结果:
pd.merge(df1,df2, how='left', suffixes=('', '_2'), left_index = True, right_index= True ))
a b c a_2 b_2 c_2
w x w x w x w x w x w x
0 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
1 1.0 1.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0
2 1.0 1.0 1.0 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN
3 1.0 1.0 1.0 1.0 1.0 1.0 NaN NaN NaN NaN NaN NaN
但我想在较低级别合并两个数据帧,后缀在 ['w', 'x'] 上生效,如下所示:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
您可以将join
或merge
与swaplevel()
或reorder_levels
。 然后使用.sort_index()
并通过axis=1
按索引列排序。
.join()
会更好。.swaplevel()
在有两个级别时更好(如本例),而.reorder_levels()
在 3 个或更多级别时更好。 以下是这些方法的 4 种组合。 对于这个特定的例子,我认为.join()
/ .swaplevel()
是最疯狂的(见最后一个例子):
df3 = (df1.reorder_levels([1,0],axis=1)
.join(df2.reorder_levels([1,0],axis=1), rsuffix='_2')
.reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[1]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (pd.merge(df1.reorder_levels([1,0],axis=1),
df2.reorder_levels([1,0],axis=1),
how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
.reorder_levels([1,0],axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[2]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (pd.merge(df1.swaplevel(axis=1),
df2.swaplevel(axis=1),
how='left', left_index=True, right_index=True, suffixes = ('', '_2'))
.swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[3]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
df3 = (df1.swaplevel(i=0,j=1, axis=1)
.join(df2.swaplevel(axis=1), rsuffix='_2')
.swaplevel(axis=1).sort_index(axis=1, level=[0, 1]))
df3
Out[4]:
a b c
w w_2 x x_2 w w_2 x x_2 w w_2 x x_2
0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
1 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0 1.0 0.0
2 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
3 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN 1.0 NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.