简体   繁体   中英

How to check in pandas that column is bool-like (includes either True, False or NaN)?

I have a dataframe like so:

df = pd.DataFrame(
  {
    'date':"20220701",
    'a':[1,2,np.NaN],
    'b':['a', 'b', 'c'], 
     'c':[True, False, np.NaN]
  }
)

columns b and c have therefore dtype object. I'd like to be able to efficiently distinguish columns, that could be boolean if they had no missing value.

Only solutions that came to my mind are:

  1. check if the unique values in a column are in [true, false, NaN], but that would most likely be supper inefficient.

  2. check where (df.c.isnull() | (df.c == True) | (df.c == False)).all()

here is one way to do it using assign

since the column is created via assign, its temporary, and not a part of df. so, nothing is lost or added

#create a temp column by ffill NA value, and check temp column dtype
df.assign(temp=df['c'].ffill())['temp'].dtype
dtype('bool')
>> df.assign(temp=df['c'].ffill())['temp'].dtype == 'bool'
True

or

#list types of the column and the newly created one is of type bool
df.assign(temp=df['c'].ffill()).dtypes

date     object
a       float64
b        object
c        object
temp       bool
dtype: object

Instead of using unique() you could use something like df.b[:10] and compare those first 10 samples to assume if your data is boolean or not.

I think it can fail but it will be faster than unique() ...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM