简体   繁体   中英

Find non-numeric values in pandas dataframe column

I got aa column in a dataframe that contains numbers and strings. So I replaced the strings by numbers via df.column.replace(["A", "B", "C", "D"], [1, 2, 3, 4], inplace=True) .

But the column is still dtype "object". I can not sort the column (TypeError error: '<' not supported between instances of 'str' and 'int').

Now how can I identify those numbers that are strings? I tried print(df[pd.to_numeric(df['column']).isnull()]) and it gives back an empty dataframe, as expected. However I read that this does not work in my case (actual numbers saved as strings). So how can I identify those numbers saved as a string?

Am I right that if a column only contains REAL numbers (int or float) it will automatically change to dtype int or float?

Thank you!

You can use pd.to_numeric with something like:

df['column'] = pd.to_numeric(df['column'], errors='coerce')

For the errors argument you have few option, see reference documentation here

Expanding on Francesco's answer, it's possible to create a mask of non-numeric values and identify unique instances to handle or remove. This uses the fact that where values cant be coerced, they are treated as nulls.

is_non_numeric = pd.to_numeric(df['column'], errors='coerce').isnull()
df[is_non_numeric]['column'].unique()

Or alternatively in a single line:

df[pd.to_numeric(df['column'], errors='coerce').isnull()]['column'].unique()

you can change dtype

    df.column.dtype=df.column.astype(int)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM