简体   繁体   English

找出对象列在熊猫中是否具有多种数据类型的最快和/或最惯用的方法是什么?

[英]What is the fastest and/or most idiomatic way of finding out whether an object column has multiple datatypes in pandas?

I have a dataframe with a column like this: 我有一个像这样的列的数据框:

df.Chromosome
# 0        1
# 1        1
# 2        1
# 3        1
# 4        1
#         ..
# 94391    Y
# 94392    Y
# 94393    Y
# 94394    Y
# 94395    Y
# Name: Chromosome, Length: 94396, dtype: object

By doing df.Chromosome.apply(type).drop_duplicates() I find that it consists of two types of data: 通过执行df.Chromosome.apply(type).drop_duplicates()我发现它包含两种类型的数据:

0        <class 'int'>
65536    <class 'str'>
Name: Chromosome, dtype: object

Is there a faster and more idiomatic way of checking whether a column consists of multiple dtypes? 有没有更快,更惯用的方式来检查列是否包含多个dtype?

I think your solution is nice, another alternatives: 我认为您的解决方案很好,还有另一种选择:

df.Chromosome.map(type).unique()

set(df.Chromosome.map(type))

Also is possible first remove duplicates in values for improve performance: 也可以先删除值中的重复项以提高性能:

df.Chromosome.drop_duplicates().apply(type).drop_duplicates()

您也可以:

df.applymap(type).drop_duplicates()

Another alternative - 另一种选择-

{type(_) for _ in set(df.Chromosome.value_counts().index)}

This is quite slow 这很慢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM