[英]How to subtract columns in a multiindex dataframe?
I have a multiindex dataframe like this: 我有一个这样的多索引数据框:
import pandas as pd
import numpy as np
df = pd.DataFrame({'ind1': list('aaaaaaaaabbbbbbbbb'),
'ind2': list('cccdddeeecccdddeee'),
'ind3': list(range(3))*6,
'val1': list(range(100, 118)),
'val2': list(range(70, 88))})
df_mult = df.set_index(['ind1', 'ind2', 'ind3'])
val1 val2
ind1 ind2 ind3
a c 0 100 70
1 101 71
2 102 72
d 0 103 73
1 104 74
2 105 75
e 0 106 76
1 107 77
2 108 78
b c 0 109 79
1 110 80
2 111 81
d 0 112 82
1 113 83
2 114 84
e 0 115 85
1 116 86
2 117 87
What I want to do is to subtract the values in df_mult.loc['a', 'e', :]
and df_mult.loc['b', 'e', :]
, respectively from the values corresponding to df_mult.loc['a', ['c', 'd'], :]
and df_mult.loc['b', ['c', 'd'], :]
, respectively. 我想要做的就是减去的值df_mult.loc['a', 'e', :]
和df_mult.loc['b', 'e', :]
,分别由相应的值是df_mult.loc['a', ['c', 'd'], :]
和df_mult.loc['b', ['c', 'd'], :]
。 The expected outcome would be 预期的结果将是
val1 val2
ind1 ind2 ind3
a c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -5
1 -3 -5
2 -3 -5
e 0 106 76
1 107 77
2 108 78
b c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -3
1 -3 -3
2 -3 -3
e 0 115 85
1 116 86
2 117 87
Ideally, something like this would work 理想情况下,这样的事情会起作用
df_mult.loc['a', ['c', 'd'], :].subtract(df_mult.loc['a', 'e', :])
but this gives me a lot of NaNs
. 但这给了我很多NaNs
。
How would I do this? 我该怎么做?
UPDATE2: with kind help of @Divakar : UPDATE2: 在@Divakar的帮助下 :
def repeat_blocks(a, repeats=2, block_length=None):
N = a.shape[0]
if not block_length:
block_length = N//2
out = np.repeat(a.reshape(N//block_length,block_length,-1),
repeats,
axis=0) \
.reshape(N*repeats,-1)
return out
In [234]: df_mult.loc[idx[['a','b'], ['c', 'd'], :], :] -= repeat_blocks(df_mult.loc[['a','b'], 'e', :].values)
In [235]: df_mult
Out[235]:
val1 val2
ind1 ind2 ind3
a c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -3
1 -3 -3
2 -3 -3
e 0 106 76
1 107 77
2 108 78
b c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -3
1 -3 -3
2 -3 -3
e 0 115 85
1 116 86
2 117 87
UPDATE: 更新:
In [100]: idx = pd.IndexSlice
In [102]: df_mult.loc[idx['a', ['c', 'd'], :], :] -= \
np.concatenate([df_mult.loc['a', 'e', :].values] * 2)
In [103]: df_mult
Out[103]:
val1 val2
ind1 ind2 ind3
a c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -3
1 -3 -3
2 -3 -3
e 0 106 76
1 107 77
2 108 78
b c 0 109 79
1 110 80
2 111 81
d 0 112 82
1 113 83
2 114 84
e 0 115 85
1 116 86
2 117 87
Old (incorrect) answer: 旧的(不正确的)答案:
In [62]: df_mult.loc['a', 'e', :] -= df_mult.loc['b', 'e', :].values
In [63]: df_mult
Out[63]:
val1 val2
ind1 ind2 ind3
a c 0 100 70
1 101 71
2 102 72
d 0 103 73
1 104 74
2 105 75
e 0 -9 -9
1 -9 -9
2 -9 -9
b c 0 109 79
1 110 80
2 111 81
d 0 112 82
1 113 83
2 114 84
e 0 115 85
1 116 86
2 117 87
Are you looking for something like ? 您是否正在寻找类似的东西? ( df
here equal df_mult
) ( df
在这里等于df_mult
)
idx = pd.IndexSlice
df.loc[idx['a', ['c', 'd'], :],idx['val1','val2']]=df.loc['a', ['c', 'd'], :].values-np.tile(df.loc['a', 'e', :].values, (2, 1))
df
Out[608]:
val1 val2
ind1 ind2 ind3
a c 0 -6 -6
1 -6 -6
2 -6 -6
d 0 -3 -3
1 -3 -3
2 -3 -3
e 0 106 76
1 107 77
2 108 78
b c 0 109 79
1 110 80
2 111 81
d 0 112 82
1 113 83
2 114 84
e 0 115 85
1 116 86
2 117 87
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.