简体   繁体   English

如何删除既有分类数据又有数值数据的数据集中的异常值?

[英]How do i remove outliers in a datset that has both categorical and numerical data?

I'm trying to remove outliers from the 'Price' column in a dataset.我正在尝试从数据集中的“价格”列中删除异常值。 I have been able to create a data frame of the outliers with their corresponding values in other columns but I'm struggling to exclude these entries from the parent dataset.我已经能够使用其他列中的相应值创建异常值的数据框,但我正在努力从父数据集中排除这些条目。 How do i go about this?我该怎么做?

this is the code i used to create the new dataframe stated above:这是我用来创建上述新数据框的代码:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
newdf

I tried using the tilde(~) sign before i specified the boolean operations but that didn't give the desired results.在指定布尔运算之前,我尝试使用波浪号(~)符号,但这没有给出预期的结果。

相反可以是:

newdf = df[((df['price'] > lower_limit) & (df['price'] < upper_limit))]

You could use the .loc attribute to get a sample of your original dataframe that excludes the elements of the newdf dataframe through the indeces:您可以使用.loc属性获取原始数据帧的样本,该样本通过newdf排除newdf数据帧的元素:

lower_limit = pq1 - 1.5 *iqr
upper_limit = pq3 + 1.5 *iqr

newdf = df[((df['price'] < lower_limit) | (df['price'] > upper_limit))]
df_not_outliers = df.loc[set(df.index) - set(newdf.index)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从同时具有数字和非数字数据的 pandas DataFrame 中删除异常值 - How do I remove outliers from a pandas DataFrame that has both numerical and non-numerical data 我如何从异常值中清除数据集,因为它包含Python中的数字和分类变量? - How can i clean my dataset from outliers as it includes numerical and categorical variables in Python? 如何从包含数值和分类值的 dataframe 中识别和删除异常值? - How to identify and remove outliers from a dataframe that contains both numerical and catagorical values? 如何将数字重新编码为分类数据 - How to recode numerical to categorical data 分类数据中的异常值? - Outliers in Categorical Data? 使用分类数据和数值数据绘制 pandas dataframe 的散点图 plot - Plotting scatter plot of pandas dataframe with both categorical and numerical data 如何使用 Scikit-learn 创建具有数字和 1-hot 分类特征的训练数据集? - How do I create a train data set with both numeric and 1-hot categorical features with Scikit-learn? 我如何理解列类型是数字还是数字分类? - How can i understand whether the column type is numerical or numerical categorical? 将分类数据编码为数值 - Encoding categorical data to numerical 如何过滤 Pandas 中的分类数据 - How do I filter categorical data in Pandas
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM