简体   繁体   中英

Selecting rows based on multiple columns in pandas - why are these two commands different?

I have a pandas DataFrame:

import pandas as pd
a = [0,0,1,1,2,7]
b = [1,0,0,1,1,4]
df = pd.DataFrame(list(zip(a,b)), columns = ('a','b'))
df
    a   b
0   0   1
1   0   0
2   1   0
3   1   1
4   2   1
5   7   4

I want to select all rows where both a and b are greater than zero:

Why does this command only return some of the desired rows:

df[(df['a'] & df['b'])>0]
    a   b
3   1   1
5   7   4

While this other command returns all of the desired rows:

df[((df['a']>0) & (df['b']>0))]

    a   b
3   1   1
4   2   1
5   7   4

it would mean that sum should greater than 1? do you also have negative values in these columns?

df[df.sum(axis=1)>1]
    a   b
3   1   1
4   2   1
5   7   4

OR

rows where values is greater than zero are summed along rows
df[df[df>0].sum(axis=1)>1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM