Selecting rows based on multiple columns in pandas - why are these two commands different?

Question

I have a pandas DataFrame:

import pandas as pd
a = [0,0,1,1,2,7]
b = [1,0,0,1,1,4]
df = pd.DataFrame(list(zip(a,b)), columns = ('a','b'))
df
    a   b
0   0   1
1   0   0
2   1   0
3   1   1
4   2   1
5   7   4

I want to select all rows where both a and b are greater than zero:

Why does this command only return some of the desired rows:

df[(df['a'] & df['b'])>0]
    a   b
3   1   1
5   7   4

While this other command returns all of the desired rows:

df[((df['a']>0) & (df['b']>0))]

    a   b
3   1   1
4   2   1
5   7   4

Answer 1

it would mean that sum should greater than 1? do you also have negative values in these columns?

df[df.sum(axis=1)>1]

OR

rows where values is greater than zero are summed along rows
df[df[df>0].sum(axis=1)>1]

Selecting rows based on multiple columns in pandas - why are these two commands different?

Question

1 answers

solution1
0 2022-09-26 22:30:13

Selecting rows based on multiple columns in pandas - why are these two commands different?

Question

1 answers

solution1 0 2022-09-26 22:30:13

solution1
0 2022-09-26 22:30:13