简体   繁体   English

检查 Pandas Dataframe 列中的哪个值是字符串

[英]Check which value in Pandas Dataframe Column is String

I have a Dataframe that consists of around 0.2 Million Records.我有一个包含大约 20 万条记录的 Dataframe。 When I'm inputting this Dataframe as an input to a model, it's throwing this error:当我输入这个 Dataframe 作为 model 的输入时,它会抛出这个错误:

Cast string to float is not supported.不支持将字符串强制转换为浮点数。

Is there any way I can check which particular value in the data frame is causing this error?有什么方法可以检查数据框中的哪个特定值导致了这个错误?

I've tried running this command and checking if any value is a string in the column.我尝试运行此命令并检查列中是否有任何值是字符串。

False in map((lambda x: type(x) == str), trainDF['Embeddings'])地图中的错误((lambda x:type(x)== str),trainDF ['Embeddings'])

Output: Output:

True真的

In panda when we convert those type mix column we do在熊猫中,当我们转换那些类型的混合列时,我们会做

df['col'] = pd.to_numeric(df['col'],errors = 'coerce')

Which will return NaN for those item can not be convert to float, you can drop then with dropna or fill some default value with fillna对于那些无法转换为浮点数的项目,这将返回NaN ,然后您可以使用dropna或使用fillna填充一些默认值

You should loop over trainDF 's indices and find the rows that have errors using try except .您应该遍历trainDF的索引并使用try except查找有错误的行。

>>> import pandas as pd
>>> trainDF = pd.DataFrame({'Embeddings':['100', '23.2', '44a', '453.2']})
>>> trainDF
  Embeddings
0        100
1       23.2
2        44a
3      453.2
>>> error_indices = []
>>> for idx, row in trainDF.iterrows():
...     try:
...         trainDF.loc[idx, 'Embeddings'] = float(row['Embeddings'])
...     except:
...         error_indices.append(idx)
...
>>> trainDF
  Embeddings
0      100.0
1       23.2
2        44a
3      453.2
>>> trainDF.loc[error_indices]
  Embeddings
2        44a

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM