如何在数据集中找到毛刺？

Question

Nowadays, I'm facing a problem that here have some datasets that contain glitches.现在，我面临一个问题，这里有一些包含故障的数据集。 Like in a dataset has a number column.就像在数据集中有一个数字列。 externally can easily be recognized that the maximum field has numbers.从外部可以很容易地识别出最大字段有数字。 But its datatype is Object.但它的数据类型是Object。 Not only that some of the fields have non-numeric values.不仅某些字段具有非数字值。
for example:例如：
A dataset has " Age " column: [23, 34, 54, 33, pp, 27, 43] and its datatype is object.一个数据集有“年龄”列： [23, 34, 54, 33, pp, 27, 43]并且它的数据类型是对象。
Now, Chake this out it has a string value " pp " into the number value.现在，Chake 这个它有一个字符串值“ pp ”到数字值中。 what we have known as a glitch in the dataset.我们所知道的数据集中的故障。
Now my question is how can I found those rows that contain the glitches like " pp ".现在我的问题是我怎样才能找到那些包含像“ pp ”这样的小故障的行。

Here is an image of what I want to discuss with you这是我想与您讨论的内容的图像

Thanks.谢谢。

Answer 1

You can use pd.to_numeric() with coercing errors (from non-numeric values) to NaN , and then check for NaN with isna() .您可以使用pd.to_numeric()为了胁迫错误（非数值），以NaN ，然后检查NaN与isna() Then, use .loc to locate the row(s) with those NaN values (from non-numeric values):然后，使用.loc使用这些NaN值（来自非数字值）定位行：

df.loc[pd.to_numeric(df['Age'], errors='coerce').isna()]

Demo演示

data = {"Age": [23, 34, 54, 33, 'pp', 27, 43] }
df = pd.DataFrame(data)

df.loc[pd.to_numeric(df['Age'], errors='coerce').isna()]

  Age
4  pp

如何在数据集中找到毛刺？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-08-02 16:23:01

如何在数据集中找到毛刺？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-08-02 16:23:01

解决方案1
0 已采纳 2021-08-02 16:23:01