Dataframe Outlier removal only if it appears on multiple columns for each row Python

Question

I have a DataFrame that have multiple columns and I want to filter all the rows that have an outlier values on at least 3 or more columns for each row . how can I do that?

I have used the following dataframe filtering method:

df[df.apply(lambda x: np.abs(x - x.mean()) / x.std() < 3).all(axis=1)

but it filters rows even when only single column have outlier value because of the.all() function.

Answer 1

We can get the sum of booleans on the row and select those with > 3 :

m = (df - df.mean()).abs().div(df.std()) < 3
df[m.sum(axis=1) > 3]

Note: we don't need apply here.

Dataframe Outlier removal only if it appears on multiple columns for each row Python

Question

1 answers

solution1
0 2020-12-19 16:36:54

Dataframe Outlier removal only if it appears on multiple columns for each row Python

Question

1 answers

solution1 0 2020-12-19 16:36:54

solution1
0 2020-12-19 16:36:54