简体   繁体   中英

Check which value in Pandas Dataframe Column is String

I have a Dataframe that consists of around 0.2 Million Records. When I'm inputting this Dataframe as an input to a model, it's throwing this error:

Cast string to float is not supported.

Is there any way I can check which particular value in the data frame is causing this error?

I've tried running this command and checking if any value is a string in the column.

False in map((lambda x: type(x) == str), trainDF['Embeddings'])

Output:

True

In panda when we convert those type mix column we do

df['col'] = pd.to_numeric(df['col'],errors = 'coerce')

Which will return NaN for those item can not be convert to float, you can drop then with dropna or fill some default value with fillna

You should loop over trainDF 's indices and find the rows that have errors using try except .

>>> import pandas as pd
>>> trainDF = pd.DataFrame({'Embeddings':['100', '23.2', '44a', '453.2']})
>>> trainDF
  Embeddings
0        100
1       23.2
2        44a
3      453.2
>>> error_indices = []
>>> for idx, row in trainDF.iterrows():
...     try:
...         trainDF.loc[idx, 'Embeddings'] = float(row['Embeddings'])
...     except:
...         error_indices.append(idx)
...
>>> trainDF
  Embeddings
0      100.0
1       23.2
2        44a
3      453.2
>>> trainDF.loc[error_indices]
  Embeddings
2        44a

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM