[英]Boolean indexing in Pandas DataFrame with MultiIndex columns
I have a DataFrame with MultiIndex columns: 我有一个带有MultiIndex列的DataFrame:
import numpy as np
import pandas as pd
columns = pd.MultiIndex.from_arrays([['n1', 'n1', 'n2', 'n2'], ['p', 'm', 'p', 'm']])
values = [
[1, 2, 3, 4],
[np.nan, 6, 7, 8],
[np.nan, 10, np.nan, 12],
]
df = pd.DataFrame(values, columns=columns)
n1 n2
p m p m
0 1.0 2 3.0 4
1 NaN 6 7.0 8
2 NaN 10 NaN 12
Now I want to set m
to NaN
whenever p
is NaN
. 现在我想将
p
设置为NaN
时将m
设置为NaN
。 Here's the result I'm looking for: 这是我要寻找的结果:
n1 n2
p m p m
0 1.0 2.0 3.0 4.0
1 NaN NaN 7.0 8.0
2 NaN NaN NaN NaN
I know how to find out where p
is NaN
, for example using 我知道如何找出
p
是NaN
,例如使用
mask = df.xs('p', level=1, axis=1).isnull()
n1 n2
0 False False
1 True False
2 True True
However, I don't know how to use this mask to set the corresponding m
values in df
to NaN
. 但是,我不知道如何使用此掩码将
df
的相应m
值设置为NaN
。
You can use pd.IndexSlice
to obtain a boolean ndarray indicating whether values are NaN
or not in the p
column on level 1
and then replacing False
to NaN
, and also to replace the values in m
by multiplying the result: 您可以使用
pd.IndexSlice
获取一个布尔pd.IndexSlice
,该布尔ndarray指示级别1
的p
列中的值是否为NaN
,然后将False
替换为NaN
,还可以通过将结果相乘来替换m
的值:
x = df.loc[:, pd.IndexSlice[:,'p']].notna().replace({False:float('nan')}).values
df.loc[:, pd.IndexSlice[:,'m']] *= x
n1 n2
p m p m
0 1.0 2 3.0 4
1 NaN NaN 7.0 8
2 NaN NaN NaN NaN
You can stack and unstack the transposed dataframe to be able to easily select and change values, and then again stack, unstack and transpose to get it back: 您可以对转置后的数据帧进行堆栈和拆栈,以便能够轻松地选择和更改值,然后再次进行堆栈,拆栈和转置以将其取回:
df = df.T.stack(dropna=False).unstack(level=1)
df.loc[df['p'].isna(), 'm'] = np.nan
df = df.stack(dropna=False).unstack(1).T
After first line, df
is: 在第一行之后,
df
为:
m p
n1 0 2.0 1.0
1 6.0 NaN
2 10.0 NaN
n2 0 4.0 3.0
1 8.0 7.0
2 12.0 NaN
And after last: 之后:
n1 n2
m p m p
0 2.0 1.0 4.0 3.0
1 NaN NaN 8.0 7.0
2 NaN NaN NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.