a = [ [1,2,3,4,5], [6,np.nan,8,np.nan,10]]
df = pd.DataFrame(a, columns=['a', 'b', 'c', 'd', 'e'], index=['foo', 'bar'])
In [5]: df
Out[5]:
a b c d e
foo 1 2.0 3 4.0 5
bar 6 NaN 8 NaN 10
I understand how normal boolean indexing works, for example if I want to select the rows that have c > 3
I would write df[df.c > 3]
. However, what if I want to do that along the row axis. Say I want only the columns that have 'bar' == np.nan
.
I would have assumed that the following should do it due to the similarly of df['a']
and df.loc['bar']
:
df.loc[df.loc['bar'].isnull()]
But it doesn't, and obviously neither does results[results.loc['hl'].isnull()]
giving the same error *** pandas.core.indexing.IndexingError: Unalignable boolean Series key provided
So how would I do it?
IIUC you want to use the boolean mask to mask the columns:
In [135]:
df[df.columns[df.loc['bar'].isnull()]]
Out[135]:
b d
foo 2.0 4.0
bar NaN NaN
Or you can use ix
and decay the series to np array:
In [138]:
df.ix[:,df.loc['bar'].isnull().values]
Out[138]:
b d
foo 2.0 4.0
bar NaN NaN
The problem here is that the boolean series returned is a mask on the columns:
In [136]:
df.loc['bar'].isnull()
Out[136]:
a False
b True
c False
d True
e False
Name: bar, dtype: bool
but your index contains none of these column values as the labels hence the error so you need to use the mask against the columns or you can pass a np array to mask the columns in ix
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.