简体   繁体   English

如何减去多索引数据框中的列?

[英]How to subtract columns in a multiindex dataframe?

I have a multiindex dataframe like this: 我有一个这样的多索引数据框:

import pandas as pd
import numpy as np

df = pd.DataFrame({'ind1': list('aaaaaaaaabbbbbbbbb'),
                   'ind2': list('cccdddeeecccdddeee'),
                   'ind3': list(range(3))*6,
                   'val1': list(range(100, 118)),
                   'val2': list(range(70, 88))})

df_mult = df.set_index(['ind1', 'ind2', 'ind3'])

                val1  val2
ind1 ind2 ind3            
a    c    0      100    70
          1      101    71
          2      102    72
     d    0      103    73
          1      104    74
          2      105    75
     e    0      106    76
          1      107    77
          2      108    78
b    c    0      109    79
          1      110    80
          2      111    81
     d    0      112    82
          1      113    83
          2      114    84
     e    0      115    85
          1      116    86
          2      117    87

What I want to do is to subtract the values in df_mult.loc['a', 'e', :] and df_mult.loc['b', 'e', :] , respectively from the values corresponding to df_mult.loc['a', ['c', 'd'], :] and df_mult.loc['b', ['c', 'd'], :] , respectively. 我想要做的就是减去的值df_mult.loc['a', 'e', :]df_mult.loc['b', 'e', :] ,分别由相应的值是df_mult.loc['a', ['c', 'd'], :]df_mult.loc['b', ['c', 'd'], :] The expected outcome would be 预期的结果将是

                val1  val2
ind1 ind2 ind3            
a    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -5
          1       -3    -5
          2       -3    -5
     e    0      106    76
          1      107    77
          2      108    78
b    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -3
          1       -3    -3
          2       -3    -3
     e    0      115    85
          1      116    86
          2      117    87

Ideally, something like this would work 理想情况下,这样的事情会起作用

df_mult.loc['a', ['c', 'd'], :].subtract(df_mult.loc['a', 'e', :])

but this gives me a lot of NaNs . 但这给了我很多NaNs

How would I do this? 我该怎么做?

UPDATE2: with kind help of @Divakar : UPDATE2: 在@Divakar的帮助下

def repeat_blocks(a, repeats=2, block_length=None):
    N = a.shape[0]
    if not block_length:
        block_length = N//2
    out = np.repeat(a.reshape(N//block_length,block_length,-1),
                    repeats,
                    axis=0) \
            .reshape(N*repeats,-1)
    return out

In [234]: df_mult.loc[idx[['a','b'], ['c', 'd'], :], :] -= repeat_blocks(df_mult.loc[['a','b'], 'e', :].values)

In [235]: df_mult
Out[235]:
                val1  val2
ind1 ind2 ind3
a    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -3
          1       -3    -3
          2       -3    -3
     e    0      106    76
          1      107    77
          2      108    78
b    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -3
          1       -3    -3
          2       -3    -3
     e    0      115    85
          1      116    86
          2      117    87

UPDATE: 更新:

In [100]: idx = pd.IndexSlice

In [102]: df_mult.loc[idx['a', ['c', 'd'], :], :] -= \
              np.concatenate([df_mult.loc['a', 'e', :].values] * 2)

In [103]: df_mult
Out[103]:
                val1  val2
ind1 ind2 ind3
a    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -3
          1       -3    -3
          2       -3    -3
     e    0      106    76
          1      107    77
          2      108    78
b    c    0      109    79
          1      110    80
          2      111    81
     d    0      112    82
          1      113    83
          2      114    84
     e    0      115    85
          1      116    86
          2      117    87

Old (incorrect) answer: 旧的(不正确的)答案:

In [62]: df_mult.loc['a', 'e', :] -= df_mult.loc['b', 'e', :].values

In [63]: df_mult
Out[63]:
                val1  val2
ind1 ind2 ind3
a    c    0      100    70
          1      101    71
          2      102    72
     d    0      103    73
          1      104    74
          2      105    75
     e    0       -9    -9
          1       -9    -9
          2       -9    -9
b    c    0      109    79
          1      110    80
          2      111    81
     d    0      112    82
          1      113    83
          2      114    84
     e    0      115    85
          1      116    86
          2      117    87

Are you looking for something like ? 您是否正在寻找类似的东西? ( df here equal df_mult ) df在这里等于df_mult

idx = pd.IndexSlice
df.loc[idx['a', ['c', 'd'], :],idx['val1','val2']]=df.loc['a', ['c', 'd'], :].values-np.tile(df.loc['a', 'e', :].values, (2, 1))

df
Out[608]: 
                val1  val2
ind1 ind2 ind3            
a    c    0       -6    -6
          1       -6    -6
          2       -6    -6
     d    0       -3    -3
          1       -3    -3
          2       -3    -3
     e    0      106    76
          1      107    77
          2      108    78
b    c    0      109    79
          1      110    80
          2      111    81
     d    0      112    82
          1      113    83
          2      114    84
     e    0      115    85
          1      116    86
          2      117    87

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM