简体   繁体   English

如何在 Pandas 中与 MultiIndex 连接

[英]How to concatenate with MultiIndex in Pandas

I have 2 dataframes like this:我有两个这样的数据框:

df1

ID      Value1      Amount2
1        100         10
2        400         20
3        300         50
df2

ID      Value1     Amount2
2        200         20
3        300         30

I want to get a table like this form these two dfs.我想从这两个 dfs 中得到一个像这样的表格。

Desired Output:

ID      Value        Amount       Difference_Value         Difference_Amount
      df1    df2    df1   df2        
1     100     0     10     0            100                      10
2     400    200    20    20            200                       0
3     300    300    50    30             0                       20

I am a bit new to Multilevel index.我对多级索引有点陌生。 I know this is possible but didn't find other questions helpful for my need.我知道这是可能的,但没有发现其他问题对我的需要有帮助。

I need this Value, Amount, Difference_Value and Difference_amount columns to be merged cells in excel so that I need to know this.我需要将此Value, Amount, Difference_Value and Difference_amount列合并到 excel 中的单元格,以便我需要知道这一点。

Thank you.谢谢你。

It is possible if MultiIndex for all columns:如果所有列的MultiIndex是可能的:

First convert ID to index by DataFrame.set_index , subtract by DataFrame.sub and join together by concat , last for change MultiIndex is used DataFrame.swaplevel and DataFrame.sort_index : First convert ID to index by DataFrame.set_index , subtract by DataFrame.sub and join together by concat , last for change MultiIndex is used DataFrame.swaplevel and DataFrame.sort_index :

df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

df3 = df1.sub(df2, fill_value=0)

df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2', 'diff']))
        .swaplevel(1,0, axis=1)
        .fillna(0)
        .sort_index(axis=1))
print (df)
   Amount2             Value1              
       df1   df2  diff    df1    df2   diff
ID                                         
1       10   0.0  10.0    100    0.0  100.0
2       20  20.0   0.0    400  200.0  200.0
3       50  30.0  20.0    300  300.0    0.0

If try join together MultiIndex and no MultiIndex Dataframes, get tuples instead MultiIndex :如果尝试将MultiIndex和没有MultiIndex数据帧连接在一起,请获取元组而不是MultiIndex

df1 = df1.set_index('ID')
df2 = df2.set_index('ID')

df3 = df1.sub(df2, fill_value=0)

df = (pd.concat([df1, df2, df3], axis=1, keys=(['df1','df2']))
        .swaplevel(1,0, axis=1)
        .fillna(0)
        .sort_index(axis=1)
        .join(df3.add_prefix('Diff_')))
print (df)
    (Amount2, df1)  (Amount2, df2)  (Value1, df1)  (Value1, df2)  Diff_Value1  \
ID                                                                              
1               10             0.0            100            0.0        100.0   
2               20            20.0            400          200.0        200.0   
3               50            30.0            300          300.0          0.0   

    Diff_Amount2  
ID                
1           10.0  
2            0.0  
3           20.0 

You can try using df.merge then split in column using pd.index.str.split您可以尝试使用df.merge然后使用pd.index.str.split在列中拆分

Use df.assign with pd.Series.sub for assigning difference values.使用df.assignpd.Series.sub来分配差异值。

d = df.merge(df1,how='outer',on='ID',suffixes=('-df1','-df2')
).fillna(0)
d
   ID  Value1-df1  Amount2-df1  Value1-df2  Amount2-df2
0   1         100           10         0.0          0.0
1   2         400           20       200.0         20.0
2   3         300           50       300.0         30.0
d = d.assign(diff_value = d['Value1-df1'].sub(d['Value1-df2']),
             diff_amount = d['Amount2-df1'].sub(d['Amount2-df2'])).set_index('ID')
d
    Value1-df1  Amount2-df1  Value1-df2  Amount2-df2  diff_value  diff_amount
ID
1          100           10         0.0          0.0       100.0         10.0
2          400           20       200.0         20.0       200.0          0.0
3          300           50       300.0         30.0         0.0         20.0

Now, split the column at '-' with expand=True for getting MultiIndex then use df.sort_index .现在,使用expand=True将列拆分为'-'以获取MultiIndex ,然后使用df.sort_index

d.columns = d.columns.str.split('-',expand=True) #expand= True makes it MultiIndex
d.sort_index(axis=1)

   Amount2       Value1        diff_amount diff_value
       df1   df2    df1    df2         NaN        NaN
ID
1       10   0.0    100    0.0        10.0      100.0
2       20  20.0    400  200.0         0.0      200.0
3       50  30.0    300  300.0        20.0        0.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM