I have a pandas dataframe with MultiIndex columns, with 3 levels:
import itertools
import numpy as np
def mklbl(prefix, n):
return ["%s%s" % (prefix, i) for i in range(n)]
miindex = pd.MultiIndex.from_product([mklbl('A', 4)])
micolumns = pd.MultiIndex.from_tuples(list(itertools.product(['A', 'B'], ['a', 'b', 'c'], ['foo', 'bar'])),
names=['lvl0', 'lvl1', 'lvl2'])
dfmi = pd.DataFrame(np.arange(len(miindex) * len(micolumns)).reshape((len(miindex), len(micolumns))),
index=miindex,
columns=micolumns).sort_index().sort_index(axis=1)
lvl0 A B
lvl1 a b c a b c
lvl2 bar foo bar foo bar foo bar foo bar foo bar foo
A0 1 0 3 2 5 4 7 6 9 8 11 10
A1 13 12 15 14 17 16 19 18 21 20 23 22
A2 25 24 27 26 29 28 31 30 33 32 35 34
A3 37 36 39 38 41 40 43 42 45 44 47 46
I want to mask this dataframe, based on another dataframe, which has the last two levels of the index:
cols = micolumns.droplevel(0).unique()
a_mask = pd.DataFrame(np.random.randn(len(dfmi.index), len(cols)), index=dfmi.index, columns=cols)
a_mask = (np.sign(a_mask) > 0).astype(bool)
a b c
foo bar foo bar foo bar
A0 False False False True True False
A1 True False True False True True
A2 True True True True False False
A3 True False False True True False
What I would like to do is to mask the original dataframe according to a_mask
. Let's say I want to set the original entries to zero, when a_mask
is true.
I tried to use pd.IndexSlice
, but it fails silently (ie I can run the following code, but has no effect:
dfmi.loc[:, pd.IndexSlice[:, a_mask]] = 0 #dfmi is unchanged
Any suggestion how to achieve this?
Edit In my use case, the labels are constructed with a cartesian product, so there will be all combinations of (lev0, lev1, lev2). But it is the case that lev0 can assume 2 values {A, B}, while lev1 can assume 3 values {a, b, c}
I think using this way is more safe.
dfmi.where(a_mask.loc[:,dfmi.columns.droplevel(0)].values,0)
Out[191]:
lvl0 A B
lvl1 a b a b
lvl2 bar foo bar foo bar foo bar foo
A0 0 0 0 2 0 0 0 6
A1 9 8 11 0 13 12 15 0
A2 0 16 19 18 0 20 23 22
A3 25 0 0 0 29 0 0 0
I would do it as follows:
mask = pd.concat({k: a_mask for k in dfmi.columns.levels[0]}, axis=1)
dfmi.where(~mask, 0)
Working with the underlying array data for in-situ edit for memory efficiency (doesn't create any other dataframe) -
d = len(dfmi.columns.levels[0])
n = dfmi.shape[1]//d
for i in range(0,d*n,n):
dfmi.values[:,i:i+n][a_mask] = 0
Sample run -
In [833]: dfmi
Out[833]:
lvl0 A B
lvl1 a b c a b c
lvl2 bar foo bar foo bar foo bar foo bar foo bar foo
A0 1 0 3 2 5 4 7 6 9 8 11 10
A1 13 12 15 14 17 16 19 18 21 20 23 22
A2 25 24 27 26 29 28 31 30 33 32 35 34
A3 37 36 39 38 41 40 43 42 45 44 47 46
In [834]: a_mask
Out[834]:
a b c
foo bar foo bar foo bar
A0 True True True False False False
A1 False True False False True False
A2 False True True True False False
A3 False False False False False True
In [835]: d = len(dfmi.columns.levels[0])
...: n = dfmi.shape[1]//d
...: for i in range(0,d*n,n):
...: dfmi.values[:,i:i+n][a_mask] = 0
In [836]: dfmi
Out[836]:
lvl0 A B
lvl1 a b c a b c
lvl2 bar foo bar foo bar foo bar foo bar foo bar foo
A0 0 0 0 2 5 4 0 0 0 8 11 10
A1 13 0 15 14 0 16 19 0 21 20 0 22
A2 25 0 0 0 29 28 31 0 0 0 35 34
A3 37 36 39 38 41 0 43 42 45 44 47 0
Updated solution more roboust not hardcode for level values:
lvl0_values = dfmi.columns.get_level_values(0).unique()
pd.concat([dfmi[i].mask(a_mask.rename_axis(['lvl1','lvl2'],axis=1),0) for i in lvl0_values],
keys=lvl0_values, axis=1)
Output:
lvl0 A B
lvl1 a b a b
lvl2 bar foo bar foo bar foo bar foo
A0 1 0 0 0 5 0 0 0
A1 9 0 11 0 13 0 15 0
A2 17 16 19 0 21 20 23 0
A3 0 24 0 26 0 28 0 30
One way you can do this:
pd.concat([dfmi['A'].mask(a_mask.rename_axis(['lvl1','lvl2'],axis=1),0),
dfmi['B'].mask(a_mask.rename_axis(['lvl1','lvl2'],axis=1),0)],
keys=['A','B'], axis=1)
print(a_mask)
lvl1 a b
lvl2 foo bar foo bar
A0 True False True True
A1 True False True False
A2 False False True False
A3 False True False True
Output:
A B
lvl1 a b a b
lvl2 bar foo bar foo bar foo bar foo
A0 1 0 0 0 5 0 0 0
A1 9 0 11 0 13 0 15 0
A2 17 16 19 0 21 20 23 0
A3 0 24 0 26 0 28 0 30
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.