Target Variable has Outliers : Machine Learning Regression

Question

I am currently working on a regression problem where the target variable has close to 2000 outliers against 54000 non outliers.

I would like to know how do we deal with data where the target variable has outliers??

Things i have tried so far:

Taking entire train data including outliers - score is ok ok
removing outliers in train data altogether - score is worse
taking a 80%combination of outliers in train data - score improves

Answer 1

In my suggestion, If you have outliner in target variable then don't simply remove the rows from the data set instead try to bring them within the boundary limits.

You can determine the upper boundary and lower boundary but plotting box plot

import seaborn as sns     
sns.boxplot(x=dataset['target Variable'])

Also, You can count the total number of occurrences of each value in the target variable using

dataset['target variable'].value_counts()

And then set the upper bound and lower bound using the following code

dataset.loc[dataset['target variable'] > upper_bound, 'target variable'] = upper_limit
dataset.loc[dataset['target variable'] < Lower_bound, 'target variable'] = Lower_limit

Target Variable has Outliers : Machine Learning Regression

Question

1 answers

solution1
0 2019-04-04 18:56:03

Target Variable has Outliers : Machine Learning Regression

Question

1 answers

solution1 0 2019-04-04 18:56:03

solution1
0 2019-04-04 18:56:03