Find non-numeric values in pandas dataframe column

Question

I got aa column in a dataframe that contains numbers and strings. So I replaced the strings by numbers via df.column.replace(["A", "B", "C", "D"], [1, 2, 3, 4], inplace=True) .

But the column is still dtype "object". I can not sort the column (TypeError error: '<' not supported between instances of 'str' and 'int').

Now how can I identify those numbers that are strings? I tried print(df[pd.to_numeric(df['column']).isnull()]) and it gives back an empty dataframe, as expected. However I read that this does not work in my case (actual numbers saved as strings). So how can I identify those numbers saved as a string?

Am I right that if a column only contains REAL numbers (int or float) it will automatically change to dtype int or float?

Thank you!

Answer 1

You can use pd.to_numeric with something like:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

For the errors argument you have few option, see reference documentation here

Answer 2

Expanding on Francesco's answer, it's possible to create a mask of non-numeric values and identify unique instances to handle or remove. This uses the fact that where values cant be coerced, they are treated as nulls.

is_non_numeric = pd.to_numeric(df['column'], errors='coerce').isnull()
df[is_non_numeric]['column'].unique()

Or alternatively in a single line:

df[pd.to_numeric(df['column'], errors='coerce').isnull()]['column'].unique()

Answer 3

you can change dtype

    df.column.dtype=df.column.astype(int)

Find non-numeric values in pandas dataframe column

Question

3 answers

solution1
0 2020-12-24 10:02:34

solution2
0 2022-11-18 08:01:21

solution3
-1 ACCPTED 2020-06-14 18:06:40

Find non-numeric values in pandas dataframe column

Question

3 answers

solution1 0 2020-12-24 10:02:34

solution2 0 2022-11-18 08:01:21

solution3 -1 ACCPTED 2020-06-14 18:06:40

solution1
0 2020-12-24 10:02:34

solution2
0 2022-11-18 08:01:21

solution3
-1 ACCPTED 2020-06-14 18:06:40