[英]How to check in pandas that column is bool-like (includes either True, False or NaN)?
I have a dataframe like so:我有一个 dataframe 像这样:
df = pd.DataFrame(
{
'date':"20220701",
'a':[1,2,np.NaN],
'b':['a', 'b', 'c'],
'c':[True, False, np.NaN]
}
)
columns b
and c
have therefore dtype object. I'd like to be able to efficiently distinguish columns, that could be boolean if they had no missing value.因此, b
列和c
的数据类型为 object。我希望能够有效地区分列,如果它们没有缺失值,则可能是 boolean。
Only solutions that came to my mind are:我想到的唯一解决方案是:
check if the unique values in a column are in [true, false, NaN], but that would most likely be supper inefficient.检查列中的唯一值是否在 [true, false, NaN] 中,但这很可能是非常低效的。
check where (df.c.isnull() | (df.c == True) | (df.c == False)).all()
检查哪里(df.c.isnull() | (df.c == True) | (df.c == False)).all()
here is one way to do it using assign这是使用分配的一种方法
since the column is created via assign, its temporary, and not a part of df.由于该列是通过分配创建的,因此它是临时的,而不是 df 的一部分。 so, nothing is lost or added所以,没有丢失或添加任何东西
#create a temp column by ffill NA value, and check temp column dtype
df.assign(temp=df['c'].ffill())['temp'].dtype
dtype('bool')
>> df.assign(temp=df['c'].ffill())['temp'].dtype == 'bool'
True
or或者
#list types of the column and the newly created one is of type bool
df.assign(temp=df['c'].ffill()).dtypes
date object
a float64
b object
c object
temp bool
dtype: object
Instead of using unique()
you could use something like df.b[:10]
and compare those first 10 samples to assume if your data is boolean or not.您可以使用df.b[:10]
类的东西来代替使用unique()
并比较前 10 个样本以假设您的数据是否为 boolean。
I think it can fail but it will be faster than unique()
...我认为它可能会失败,但它会比unique()
更快......
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.