简体   繁体   中英

Pandas any() returning false with true values present

I have a largely empty dataframe of poorly formatted dates that I converted into DateTime format.

from io import StringIO

data = StringIO("""issue_date,issue_date_dt
,
,
19600215.0,1960-02-15
,
,""")

df = pd.read_csv(data, parse_dates=[1])

Which produces

    issue_date  issue_date_dt
0   NaN         NaT
1   NaN         NaT
2   19600215.0  1960-02-15
3   NaN         NaT
4   NaN         NaT

I'd expect that I could use df.any() to find whether there was a value in a row or column. axis=0 behaves as expected:

df.any(axis=0)

issue_date       True
issue_date_dt    True
dtype: bool

But axis=1 just returns false for all rows all the time.

df.any(axis=1)

0    False
1    False
2    False
3    False
4    False
dtype: bool

I'm not entirely sure why this is occuring [1] , my best guess is that the differing datatypes along the first axis cause this unexpected result, as any works as expected along axis 0 . However , I would argue that the workaround to this is actually a better approach anyways, as it is more immediately clear to a reader what exactly you are checking for.


This could potentially be a bug, if you agree I would recommend opening an issue on the pandas github page .

The workaround is straightforward, make use of notnull to use any on a homogenous mask of type bool , rather than a DataFrame containing mixed types

df.notnull().any(1)

0    False
1    False
2     True
3    False
4    False
dtype: bool

[1] This appears to have been recognized as a bug

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM