I am currently working on a regression problem where the target variable has close to 2000 outliers against 54000 non outliers.
I would like to know how do we deal with data where the target variable has outliers??
Things i have tried so far:
In my suggestion, If you have outliner in target variable then don't simply remove the rows from the data set instead try to bring them within the boundary limits.
You can determine the upper boundary and lower boundary but plotting box plot
import seaborn as sns
sns.boxplot(x=dataset['target Variable'])
Also, You can count the total number of occurrences of each value in the target variable using
dataset['target variable'].value_counts()
And then set the upper bound and lower bound using the following code
dataset.loc[dataset['target variable'] > upper_bound, 'target variable'] = upper_limit
dataset.loc[dataset['target variable'] < Lower_bound, 'target variable'] = Lower_limit
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.