检查 Pandas Dataframe 列中的哪个值是字符串

Question

I have a Dataframe that consists of around 0.2 Million Records.我有一个包含大约 20 万条记录的 Dataframe。 When I'm inputting this Dataframe as an input to a model, it's throwing this error:当我输入这个 Dataframe 作为 model 的输入时，它会抛出这个错误：

Cast string to float is not supported.不支持将字符串强制转换为浮点数。

Is there any way I can check which particular value in the data frame is causing this error?有什么方法可以检查数据框中的哪个特定值导致了这个错误？

I've tried running this command and checking if any value is a string in the column.我尝试运行此命令并检查列中是否有任何值是字符串。

False in map((lambda x: type(x) == str), trainDF['Embeddings'])地图中的错误（（lambda x：type（x）== str），trainDF ['Embeddings']）

Output: Output：

True真的

Answer 1

In panda when we convert those type mix column we do在熊猫中，当我们转换那些类型的混合列时，我们会做

df['col'] = pd.to_numeric(df['col'],errors = 'coerce')

Which will return NaN for those item can not be convert to float, you can drop then with dropna or fill some default value with fillna对于那些无法转换为浮点数的项目，这将返回NaN ，然后您可以使用dropna或使用fillna填充一些默认值

Answer 2

You should loop over trainDF 's indices and find the rows that have errors using try except .您应该遍历trainDF的索引并使用try except查找有错误的行。

>>> import pandas as pd
>>> trainDF = pd.DataFrame({'Embeddings':['100', '23.2', '44a', '453.2']})
>>> trainDF
  Embeddings
0        100
1       23.2
2        44a
3      453.2
>>> error_indices = []
>>> for idx, row in trainDF.iterrows():
...     try:
...         trainDF.loc[idx, 'Embeddings'] = float(row['Embeddings'])
...     except:
...         error_indices.append(idx)
...
>>> trainDF
  Embeddings
0      100.0
1       23.2
2        44a
3      453.2
>>> trainDF.loc[error_indices]
  Embeddings
2        44a

检查 Pandas Dataframe 列中的哪个值是字符串

问题描述

2 个解决方案

解决方案1
0 2021-05-14 02:23:00

解决方案2
0 2021-05-14 02:26:39

检查 Pandas Dataframe 列中的哪个值是字符串

问题描述

2 个解决方案

解决方案1 0 2021-05-14 02:23:00

解决方案2 0 2021-05-14 02:26:39

解决方案1
0 2021-05-14 02:23:00

解决方案2
0 2021-05-14 02:26:39