简体   繁体   English

Pandas 仅基于两个索引级别匹配的多索引减法

[英]Pandas Multiindex subtract based on only two index level matchings

Say I have a Pandas multi-index data frame with 3 indices:假设我有一个包含 3 个索引的 Pandas 多索引数据框:

import pandas as pd
import numpy as np
arrays = [['UK', 'UK', 'US', 'FR'], ['Firm1', 'Firm1', 'Firm2', 'Firm1'], ['Andy', 'Peter', 'Peter', 'Andy']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm', 'Responsible'))
df_3idx = pd.DataFrame(np.random.randn(4,3), index = idx)
df_3idx
                                  0         1         2
Country Firm  Responsible                              
UK      Firm1 Andy         0.237655  2.049636  0.480805
              Peter        1.135344  0.745616 -0.577377
US      Firm2 Peter        0.034786 -0.278936  0.877142
FR      Firm1 Andy         0.048224  1.763329 -1.597279

I have furthermore another pd.dataframe consisting of unique combinations of multi-index-level 1 and 2 from the above data:我还有另一个 pd.dataframe,由上述数据中多索引级别 1 和 2 的独特组合组成:

arrays = [['UK', 'US', 'FR'], ['Firm1', 'Firm2', 'Firm1']]
idx = pd.MultiIndex.from_arrays(arrays, names = ('Country', 'Firm'))
df_2idx = pd.DataFrame(np.random.randn(3,1), index = idx)
df_2idx
                      0
Country Firm           
UK      Firm1 -0.103828
US      Firm2  0.096192
FR      Firm1 -0.686631

I want to subtract the values from df_3idx by the corresponding value in df_2idx , so, for instance, I want to subtract from every value of the first two rows the value -0.103828, as index 1 and 2 from both dataframes match.我想用df_3idx中的相应值减去df_2idx中的值,因此,例如,我想从前两行的每个值中减去值 -0.103828,因为两个数据帧中的索引 1 和 2 都匹配。

Does anybody know how to do this?有人知道怎么做这个吗? I figured I could simply unstack the first dataframe and then subtract, but I am getting an error message.我想我可以简单地拆开第一个 dataframe 然后减去,但我收到一条错误消息。

df_3idx.unstack('Responsible').sub(df_2idx, axis=0)

ValueError: cannot join with no overlapping index names

Unstacking might anyway not be a preferable solution as my data is very big and unstacking might take a lot of time.无论如何,取消堆叠可能不是一个更好的解决方案,因为我的数据非常大并且取消堆叠可能需要很多时间。

I would appreciate any help.我将不胜感激任何帮助。 Many thanks in advance!提前谢谢了!

related question but not focused on MultiIndex相关问题但不关注MultiIndex

However, the answer doesn't really care.但是,答案并不真正在乎。 The sub method will align on the matching index levels. sub方法将对齐匹配的索引级别。

pd.DataFrame.sub with parameter axis=0 pd.DataFrame.sub参数axis=0

df_3idx.sub(df_2idx[0], axis=0)

                                  0         1         2
Country Firm  Responsible                              
FR      Firm1 Andy         0.027800  3.316148  0.804833
UK      Firm1 Andy        -2.009797 -1.830799 -0.417737
              Peter       -1.174544  0.644006 -1.150073
US      Firm2 Peter       -2.211121 -3.825443 -4.391965

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM