如何在 Pandas 中与 MultiIndex 连接

Question

我有两个这样的数据框：

df1

ID      Value1      Amount2
1        100         10
2        400         20
3        300         50

df2

ID      Value1     Amount2
2        200         20
3        300         30

我想从这两个 dfs 中得到一个像这样的表格。

Desired Output:

ID      Value        Amount       Difference_Value         Difference_Amount
      df1    df2    df1   df2        
1     100     0     10     0            100                      10
2     400    200    20    20            200                       0
3     300    300    50    30             0                       20

我对多级索引有点陌生。 我知道这是可能的，但没有发现其他问题对我的需要有帮助。

我需要将此Value, Amount, Difference_Value and Difference_amount列合并到 excel 中的单元格，以便我需要知道这一点。

谢谢你。

Answer 1

如果所有列的MultiIndex是可能的：

First convert ID to index by DataFrame.set_index , subtract by DataFrame.sub and join together by concat , last for change MultiIndex is used DataFrame.swaplevel and DataFrame.sort_index :

df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

df3 = df1.sub(df2, fill_value=0)

df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2', 'diff']))
        .swaplevel(1,0, axis=1)
        .fillna(0)
        .sort_index(axis=1))
print (df)
   Amount2             Value1              
       df1   df2  diff    df1    df2   diff
ID                                         
1       10   0.0  10.0    100    0.0  100.0
2       20  20.0   0.0    400  200.0  200.0
3       50  30.0  20.0    300  300.0    0.0

如果尝试将MultiIndex和没有MultiIndex数据帧连接在一起，请获取元组而不是MultiIndex ：

df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

df3 = df1.sub(df2, fill_value=0)

df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2']))
        .swaplevel(1,0, axis=1)
        .fillna(0)
        .sort_index(axis=1)
        .join(df3.add_prefix('Diff_')))
print (df)
    (Amount2, df1)  (Amount2, df2)  (Value1, df1)  (Value1, df2)  Diff_Value1  \
ID                                                                              
1               10             0.0            100            0.0        100.0   
2               20            20.0            400          200.0        200.0   
3               50            30.0            300          300.0          0.0   

    Diff_Amount2  
ID                
1           10.0  
2            0.0  
3           20.0

Answer 2

您可以尝试使用df.merge然后使用pd.index.str.split在列中拆分

使用df.assign和pd.Series.sub来分配差异值。

d = df.merge(df1,how='outer',on='ID',suffixes=('-df1','-df2')
).fillna(0)
d
   ID  Value1-df1  Amount2-df1  Value1-df2  Amount2-df2
0   1         100           10         0.0          0.0
1   2         400           20       200.0         20.0
2   3         300           50       300.0         30.0
d = d.assign(diff_value = d['Value1-df1'].sub(d['Value1-df2']),
             diff_amount = d['Amount2-df1'].sub(d['Amount2-df2'])).set_index('ID')
d
    Value1-df1  Amount2-df1  Value1-df2  Amount2-df2  diff_value  diff_amount
ID
1          100           10         0.0          0.0       100.0         10.0
2          400           20       200.0         20.0       200.0          0.0
3          300           50       300.0         30.0         0.0         20.0

现在，使用expand=True将列拆分为'-'以获取MultiIndex ，然后使用df.sort_index 。

d.columns = d.columns.str.split('-',expand=True) #expand= True makes it MultiIndex
d.sort_index(axis=1)

   Amount2       Value1        diff_amount diff_value
       df1   df2    df1    df2         NaN        NaN
ID
1       10   0.0    100    0.0        10.0      100.0
2       20  20.0    400  200.0         0.0      200.0
3       50  30.0    300  300.0        20.0        0.0

如何在 Pandas 中与 MultiIndex 连接

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-06-07 06:41:06

解决方案2
1 2020-06-07 07:16:29

如何在 Pandas 中与 MultiIndex 连接

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-06-07 06:41:06

解决方案2 1 2020-06-07 07:16:29

解决方案1
2 已采纳 2020-06-07 06:41:06

解决方案2
1 2020-06-07 07:16:29