简体   繁体   中英

Boolean Indexing along the row axis of a DataFrame in pandas

a = [ [1,2,3,4,5], [6,np.nan,8,np.nan,10]]
df = pd.DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'], index=['foo', 'bar'])

In [5]: df
Out[5]: 
     a    b  c    d   e
foo  1  2.0  3  4.0   5
bar  6  NaN  8  NaN  10

I understand how normal boolean indexing works, for example if I want to select the rows that have c > 3 I would write df[df.c > 3] . However, what if I want to do that along the row axis. Say I want only the columns that have 'bar' == np.nan .

I would have assumed that the following should do it due to the similarly of df['a'] and df.loc['bar'] :

df.loc[df.loc['bar'].isnull()]

But it doesn't, and obviously neither does results[results.loc['hl'].isnull()] giving the same error *** pandas.core.indexing.IndexingError: Unalignable boolean Series key provided

So how would I do it?

IIUC you want to use the boolean mask to mask the columns:

In [135]:
df[df.columns[df.loc['bar'].isnull()]]

Out[135]:
       b    d
foo  2.0  4.0
bar  NaN  NaN

Or you can use ix and decay the series to np array:

In [138]:
df.ix[:,df.loc['bar'].isnull().values]

Out[138]:
       b    d
foo  2.0  4.0
bar  NaN  NaN

The problem here is that the boolean series returned is a mask on the columns:

In [136]:
df.loc['bar'].isnull()

Out[136]:
a    False
b     True
c    False
d     True
e    False
Name: bar, dtype: bool

but your index contains none of these column values as the labels hence the error so you need to use the mask against the columns or you can pass a np array to mask the columns in ix

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM