具有MultiIndex列的Pandas DataFrame中的布尔索引

Question

I have a DataFrame with MultiIndex columns: 我有一个带有MultiIndex列的DataFrame：

import numpy as np
import pandas as pd

columns = pd.MultiIndex.from_arrays([['n1', 'n1', 'n2', 'n2'], ['p', 'm', 'p', 'm']])
values = [
    [1,      2,  3,      4],
    [np.nan, 6,  7,      8],
    [np.nan, 10, np.nan, 12],
]
df = pd.DataFrame(values, columns=columns)

    n1       n2    
     p   m    p   m
0  1.0   2  3.0   4
1  NaN   6  7.0   8
2  NaN  10  NaN  12

Now I want to set m to NaN whenever p is NaN . 现在我想将p设置为NaN时将m设置为NaN 。 Here's the result I'm looking for: 这是我要寻找的结果：

    n1        n2     
     p    m    p    m
0  1.0  2.0  3.0  4.0
1  NaN  NaN  7.0  8.0
2  NaN  NaN  NaN  NaN

I know how to find out where p is NaN , for example using 我知道如何找出p是NaN ，例如使用

mask = df.xs('p', level=1, axis=1).isnull()

      n1     n2
0  False  False
1   True  False
2   True   True

However, I don't know how to use this mask to set the corresponding m values in df to NaN . 但是，我不知道如何使用此掩码将df的相应m值设置为NaN 。

Answer 1

You can use pd.IndexSlice to obtain a boolean ndarray indicating whether values are NaN or not in the p column on level 1 and then replacing False to NaN , and also to replace the values in m by multiplying the result: 您可以使用pd.IndexSlice获取一个布尔pd.IndexSlice ，该布尔ndarray指示级别1的p列中的值是否为NaN ，然后将False替换为NaN ，还可以通过将结果相乘来替换m的值：

x = df.loc[:, pd.IndexSlice[:,'p']].notna().replace({False:float('nan')}).values
df.loc[:, pd.IndexSlice[:,'m']] *= x

       n1        n2     
     p    m    p    m
0  1.0    2  3.0    4
1  NaN  NaN  7.0    8
2  NaN  NaN  NaN  NaN

Answer 2

You can stack and unstack the transposed dataframe to be able to easily select and change values, and then again stack, unstack and transpose to get it back: 您可以对转置后的数据帧进行堆栈和拆栈，以便能够轻松地选择和更改值，然后再次进行堆栈，拆栈和转置以将其取回：

df = df.T.stack(dropna=False).unstack(level=1)
df.loc[df['p'].isna(), 'm'] = np.nan

df = df.stack(dropna=False).unstack(1).T

After first line, df is: 在第一行之后， df为：

         m    p
n1 0   2.0  1.0
   1   6.0  NaN
   2  10.0  NaN
n2 0   4.0  3.0
   1   8.0  7.0
   2  12.0  NaN

And after last: 之后：

    n1        n2     
     m    p    m    p
0  2.0  1.0  4.0  3.0
1  NaN  NaN  8.0  7.0
2  NaN  NaN  NaN  NaN

具有MultiIndex列的Pandas DataFrame中的布尔索引

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-07-03 08:48:09

解决方案2
2 2019-07-03 09:06:28

具有MultiIndex列的Pandas DataFrame中的布尔索引

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-07-03 08:48:09

解决方案2 2 2019-07-03 09:06:28

解决方案1
2 已采纳 2019-07-03 08:48:09

解决方案2
2 2019-07-03 09:06:28