如何删除既有分类数据又有数值数据的数据集中的异常值？

Question

I'm trying to remove outliers from the 'Price' column in a dataset.我正在尝试从数据集中的“价格”列中删除异常值。 I have been able to create a data frame of the outliers with their corresponding values in other columns but I'm struggling to exclude these entries from the parent dataset.我已经能够使用其他列中的相应值创建异常值的数据框，但我正在努力从父数据集中排除这些条目。 How do i go about this?我该怎么做？

this is the code i used to create the new dataframe stated above:这是我用来创建上述新数据框的代码：

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
newdf

I tried using the tilde(~) sign before i specified the boolean operations but that didn't give the desired results.在指定布尔运算之前，我尝试使用波浪号（~）符号，但这没有给出预期的结果。

Answer 1

相反可以是：

newdf = df[((df['price'] > lower_limit) & (df['price'] < upper_limit))]

Answer 2

You could use the .loc attribute to get a sample of your original dataframe that excludes the elements of the newdf dataframe through the indeces:您可以使用.loc属性获取原始数据帧的样本，该样本通过newdf排除newdf数据帧的元素：

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
df_not_outliers = df.loc[set(df.index) - set(newdf.index)]

如何删除既有分类数据又有数值数据的数据集中的异常值？

问题描述

2 个解决方案

解决方案1
0 2020-09-24 16:27:01

解决方案2
0 2020-09-25 21:01:51

如何删除既有分类数据又有数值数据的数据集中的异常值？

问题描述

2 个解决方案

解决方案1 0 2020-09-24 16:27:01

解决方案2 0 2020-09-25 21:01:51

解决方案1
0 2020-09-24 16:27:01

解决方案2
0 2020-09-25 21:01:51