简体   繁体   中英

Pandas : Boolean indexing on multiple columns

I have a data frame as below.

In [23]: data2 = [{'a': 'x', 'b': 'y','c':'q'}, {'a': 'x', 'b': 'p', 'c': 'q'}, {'a':'p', 'b':'q'},{'a':'q', 'b':'y','c':'q'}]
In [26]: df = pd.DataFrame(data2)
In [27]: df
Out[27]: 
   a  b    c
0  x  y    q
1  x  p    q
2  p  q  NaN
3  q  y    q

I want to do boolean indexing to filter out columns which have either x or y. This i am doing as

In [29]: df[df['a'].isin(['x','y']) | (df['b'].isin(['x','y']))]
Out[29]: 
   a  b  c
0  x  y  q
1  x  p  q
3  q  y  q

But i have over 50 columns in which i need to check and checking each columns seems not very pythonic. I tried

In [30]: df[df[['a','b']].isin(['x','y'])]

But the output is not what i expect, i get the below

Out[30]: 
     a    b    c
0    x    y  NaN
1    x  NaN  NaN
2  NaN  NaN  NaN
3  NaN    y  NaN

I can drop rows which are all NaN but the values are missing in the rest.

For example in row-0 columns-c is NaN but i need that value.

Any suggestions how to do this ?

You can compare your df with 'x' and 'y' and then do a logical or to find rows with either 'x' or 'y'. Then use the boolean array as index to select those rows.

df.loc[(df.eq('x') | df.eq('y')).any(1)]
Out[68]: 
   a  b  c
0  x  y  q
1  x  p  q
3  q  y  q

This works:

df.loc[df.apply(lambda x: 'x' in list(x) or 'y' in list(x), axis=1)]

   a  b  c
0  x  y  q
1  x  p  q
3  q  y  q

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM