简体   繁体   English

Dataframe 仅当它出现在每行的多个列上时才删除异常值 Python

[英]Dataframe Outlier removal only if it appears on multiple columns for each row Python

I have a DataFrame that have multiple columns and I want to filter all the rows that have an outlier values on at least 3 or more columns for each row .我有一个 DataFrame 有多个列,我想过滤每行至少有 3 列或更多列的异常值的所有行 how can I do that?我怎样才能做到这一点?

I have used the following dataframe filtering method:我使用了以下dataframe过滤方法:

df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)

but it filters rows even when only single column have outlier value because of the.all() function.但由于.all() function,即使只有单列具有异常值,它也会过滤行。

We can get the sum of booleans on the row and select those with > 3 :我们可以得到行上的布尔值和 select > 3的总和:

m = (df - df.mean()).abs().div(df.std()) < 3
df[m.sum(axis=1) > 3]

Note: we don't need apply here.注意:我们不需要在这里申请。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM