[英]What is the fastest and/or most idiomatic way of finding out whether an object column has multiple datatypes in pandas?
I have a dataframe with a column like this: 我有一个像这样的列的数据框:
df.Chromosome
# 0 1
# 1 1
# 2 1
# 3 1
# 4 1
# ..
# 94391 Y
# 94392 Y
# 94393 Y
# 94394 Y
# 94395 Y
# Name: Chromosome, Length: 94396, dtype: object
By doing df.Chromosome.apply(type).drop_duplicates()
I find that it consists of two types of data: 通过执行
df.Chromosome.apply(type).drop_duplicates()
我发现它包含两种类型的数据:
0 <class 'int'>
65536 <class 'str'>
Name: Chromosome, dtype: object
Is there a faster and more idiomatic way of checking whether a column consists of multiple dtypes? 有没有更快,更惯用的方式来检查列是否包含多个dtype?
I think your solution is nice, another alternatives: 我认为您的解决方案很好,还有另一种选择:
df.Chromosome.map(type).unique()
set(df.Chromosome.map(type))
Also is possible first remove duplicates in values for improve performance: 也可以先删除值中的重复项以提高性能:
df.Chromosome.drop_duplicates().apply(type).drop_duplicates()
您也可以:
df.applymap(type).drop_duplicates()
Another alternative - 另一种选择-
{type(_) for _ in set(df.Chromosome.value_counts().index)}
This is quite slow 这很慢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.