简体   繁体   中英

Pandas Boolean Indexing Issue

Can anyone explain the following behaviour. I am expecting all three rows to be returned.

import pandas as pd

test_dict = {
    'col1':[None, None, None],
    'col2':[True, False, True],
    'col3':[True, True, False]
}

df = pd.DataFrame(test_dict)

df[ df.col1 | df.col2 | df.col3 ]
>>> Return only first two rows (index 0 and 1)

Replacing the None values with empty strings using df.fillna('') appears to fix it but I don't understand why the first two rows work fine if None is an issue.

Also changing the order of the comparisons effects it. If I swap col2 and col3 in the mask then the row with index 1 is no longer returned but the row with index 2 is returned. If col1 comes last then all rows are returned.

The problem is that the evaluation is from left to right. That is

df.col1 | df.col2 | df.col3 == (df.col1 | df.col2) | df.col3

Now, I think this is an implementation choice in Pandas that None | True None | True is evaluated as False . So in this case (df.col1 | df.col2) is all False . That's why you only see the first to rows.

To fix this. use

df[df.any(axis=1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM