以不同级别的熊猫加入MultiIndex

Question

How can one join 2 pandas DataFrames on MultiIndex with different number of levels? 一个如何连接具有不同级别数的MultiIndex上的2个pandas DataFrame？

import pandas as pd
t1 = pd.DataFrame(data={'a1':[0,0,1,1,2,2],
                        'a2':[0,1,0,1,0,1],
                        'x':[1.,2.,3.,4.,5.,6.]})
t1.set_index(['a1','a2'], inplace=True)
t1.sort_index(inplace=True)
t2 = pd.DataFrame(data={'b1':[0,1,2],
                        'y':[20.,40.,60.]})
t2.set_index(['b1'], inplace=True)
t2.sort_index(inplace=True)

Expected result for joining on 'a1' => 'b1': 加入'a1'=>'b1'的预期结果：

         x    y
a1 a2
0  0   1.0 20.0
   1   2.0 20.0
1  0   3.0 40.0
   1   4.0 40.0
2  0   5.0 60.0
   1   6.0 60.0

Another example: joining on ['a1','a2'] => ['b1','b2']: 另一个例子：加入['a1'，'a2'] => ['b1'，'b2']：

import pandas as pd, numpy as np
t1 = pd.DataFrame(data={'a1':[0,0,0,0,1,1,1,1,2,2,2,2],
                        'a2':[3,3,4,4,3,3,4,4,3,3,4,4],
                        'a3':[7,8,7,8,7,8,7,8,7,8,7,8],
                        'x':[1.,2.,3.,4.,5.,6.,7.,8.,9.,10.,11.,12.]})
t1.set_index(['a1','a2','a3'], inplace=True)
t1.sort_index(inplace=True)
t2 = pd.DataFrame(data={'b1':[0,0,1,1,2,2],
                        'b2':[3,4,3,4,3,4],
                        'y':[10.,20.,30.,40.,50.,60.]})
t2.set_index(['b1','b2'], inplace=True)
t2.sort_index(inplace=True)

>>> t1
             x
a1 a2 a3   
0  3  7    1.0
      8    2.0
   4  7    3.0
      8    4.0
1  3  7    5.0
      8    6.0
   4  7    7.0
      8    8.0
2  3  7    9.0
      8   10.0
   4  7   11.0
      8   12.0
>>> t2
          y
b1 b2
0  3   10.0
   4   20.0
1  3   30.0
   4   40.0
2  3   50.0
   4   60.0

Expected result for joining on ['a1','a2'] => ['b1','b2']: 加入['a1'，'a2'] => ['b1'，'b2']的预期结果：

             x     y
a1 a2 a3         
0  3  7    1.0  10.0
      8    2.0  10.0
   4  7    3.0  20.0
      8    4.0  20.0
1  3  7    5.0  30.0
      8    6.0  30.0
   4  7    7.0  40.0
      8    8.0  40.0
2  3  7    9.0  50.0
      8   10.0  50.0
   4  7   11.0  60.0
      8   12.0  60.0

The solution should work joining on multiple index levels. 该解决方案应在多个索引级别上协同工作。

Thank you for your help! 谢谢您的帮助！

Answer 1

You can use pd.Index.get_level_values and map a series from t2 : 您可以使用pd.Index.get_level_values并映射t2的一系列：

t1['y'] = t1.index.get_level_values(0).map(t2['y'].get)

print(t1)

         x     y
a1 a2           
0  0   1.0  20.0
   1   2.0  20.0
1  0   3.0  40.0
   1   4.0  40.0
2  0   5.0  60.0
   1   6.0  60.0

Answer 2

You could merge t1 and t2 directly on the index level named a1 in t1 , and the single index of t2 : 您可以直接在t1名为a1的索引级别和t2的单个索引上合并t1和t2 ：

t1.merge(t2, left_on = t1.index.get_level_values('a1').values, right_index=True)

         x     y
a1 a2           
0  0   1.0  20.0
   1   2.0  20.0
1  0   3.0  40.0
   1   4.0  40.0
2  0   5.0  60.0
   1   6.0  60.0

Answer 3

Use reindex on t2 , setting the level parameter as appropriate, and directly assign to t1 : 在t2上使用reindex ，适当地设置level参数，然后直接分配给t1 ：

t1['y'] = t2['y'].reindex(t1.index, level='a1')

         x     y
a1 a2           
0  0   1.0  20.0
   1   2.0  20.0
1  0   3.0  40.0
   1   4.0  40.0
2  0   5.0  60.0
   1   6.0  60.0

To reindex on multiple levels, simply pass a list as the level parameter, eg ['a1', 'a2' ]. 要在多个级别上重新索引，只需传递一个列表作为level参数，例如['a1', 'a2' ]。

Answer 4

Solution to the 1st example: 第一个示例的解决方案：

t1.reset_index('a2', drop=False).join(t2
    ).rename_axis('a1').set_index('a2', append=True)

Solution to the 2nd example: 第二个示例的解决方案：

t1.reset_index('a3', drop=False).join(
    t2.rename_axis(index={'b1':'a1', 'b2':'a2'})
    ).set_index('a3', append=True)

Answer 5

A slow way to do the join in the 2nd example: 在第二个示例中进行连接的慢速方法：

for col in t2.columns:
    for i2 in t2.index:
        t1.loc[i2+(slice(None),),col] = t2.loc[i2,col]

The task is to vectorize it and to put slice(None) automatically in the correct locations while creating a t1 index item. 任务是对其进行矢量化处理，并在创建t1索引项时自动将slice（None）放置在正确的位置。

Vectorized version for the 2nd example: 第二个示例的矢量化版本：

m = list(zip(t1.index.get_level_values('a1'), t1.index.get_level_values('a2')))
t1 = t1.assign(**dict(zip(t2.columns,[np.nan]*len(t2.columns))))
t1[t2.columns] = t2.loc[m,:].values

Vectorized version for the 1st example: 第一个示例的向量化版本：

m = t1.index.get_level_values('a1')
t1 = t1.assign(**dict(zip(t2.columns,[np.nan]*len(t2.columns))))
t1[t2.columns] = t2.loc[m,:].values

以不同级别的熊猫加入MultiIndex

问题描述

5 个解决方案

解决方案1
2 2018-05-22 22:23:20

解决方案2
1 2018-05-22 22:33:25

解决方案3
1 2018-05-22 22:37:17

解决方案4
1 已采纳 2019-05-17 19:57:21

解决方案5
0 2018-05-23 15:42:27

以不同级别的熊猫加入MultiIndex

问题描述

5 个解决方案

解决方案1 2 2018-05-22 22:23:20

解决方案2 1 2018-05-22 22:33:25

解决方案3 1 2018-05-22 22:37:17

解决方案4 1 已采纳 2019-05-17 19:57:21

解决方案5 0 2018-05-23 15:42:27

解决方案1
2 2018-05-22 22:23:20

解决方案2
1 2018-05-22 22:33:25

解决方案3
1 2018-05-22 22:37:17

解决方案4
1 已采纳 2019-05-17 19:57:21

解决方案5
0 2018-05-23 15:42:27