How to find glitch in the dataset?

Question

Nowadays, I'm facing a problem that here have some datasets that contain glitches. Like in a dataset has a number column. externally can easily be recognized that the maximum field has numbers. But its datatype is Object. Not only that some of the fields have non-numeric values.
for example:
A dataset has " Age " column: [23, 34, 54, 33, pp, 27, 43] and its datatype is object.
Now, Chake this out it has a string value " pp " into the number value. what we have known as a glitch in the dataset.
Now my question is how can I found those rows that contain the glitches like " pp ".

Here is an image of what I want to discuss with you

Thanks.

Answer 1

You can use pd.to_numeric() with coercing errors (from non-numeric values) to NaN , and then check for NaN with isna() . Then, use .loc to locate the row(s) with those NaN values (from non-numeric values):

df.loc[pd.to_numeric(df['Age'], errors='coerce').isna()]

Demo

data = {"Age": [23, 34, 54, 33, 'pp', 27, 43] }
df = pd.DataFrame(data)

df.loc[pd.to_numeric(df['Age'], errors='coerce').isna()]

  Age
4  pp

How to find glitch in the dataset?

Question

1 answers

solution1
0 ACCPTED 2021-08-02 16:23:01

How to find glitch in the dataset?

Question

1 answers

solution1 0 ACCPTED 2021-08-02 16:23:01

solution1
0 ACCPTED 2021-08-02 16:23:01