Dataframe 仅当它出现在每行的多个列上时才删除异常值 Python

Question

I have a DataFrame that have multiple columns and I want to filter all the rows that have an outlier values on at least 3 or more columns for each row .我有一个 DataFrame 有多个列，我想过滤每行至少有 3 列或更多列的异常值的所有行。 how can I do that?我怎样才能做到这一点？

I have used the following dataframe filtering method:我使用了以下dataframe过滤方法：

df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)

but it filters rows even when only single column have outlier value because of the.all() function.但由于.all() function，即使只有单列具有异常值，它也会过滤行。

Answer 1

We can get the sum of booleans on the row and select those with > 3 :我们可以得到行上的布尔值和 select > 3的总和：

m = (df - df.mean()).abs().div(df.std()) < 3
df[m.sum(axis=1) > 3]

Note: we don't need apply here.注意：我们不需要在这里申请。

Dataframe 仅当它出现在每行的多个列上时才删除异常值 Python

问题描述

1 个解决方案

解决方案1
0 2020-12-19 16:36:54

Dataframe 仅当它出现在每行的多个列上时才删除异常值 Python

问题描述

1 个解决方案

解决方案1 0 2020-12-19 16:36:54

解决方案1
0 2020-12-19 16:36:54