简体   繁体   English

异常值检测并在完整的 dataframe 中替换它们

[英]outlier detection and replacing them in complete dataframe

def outliers(column, creditCardData):

creditCardData[column].describe()


zscore = (creditCardData[column] -
creditCardData[column].mean())/creditCardData[column].std()
no_of_out = sum(zscore > 3)
print('No of outliers: ', no_of_out)

upper_f = creditCardData[column].mean() + 3*creditCardData[column].std()
lower_f = creditCardData[column].mean() - 3*creditCardData[column].std()

no_of_out_up = sum(creditCardData[column]>upper_f)
no_of_out_lo = sum(creditCardData[column]<lower_f)

print('Removing outliers____________')

creditCardData[column][creditCardData[column]>upper_f] = upper_f
creditCardData[column][creditCardData[column]<lower_f] = lower_f

no_of_out_up = sum(creditCardData[column]>upper)
no_of_out_lo = sum(creditCardData[column]<lower)

print('Null values: ', creditCardData[column].isnull().sum())


outliers('PURCHASES', creditCardData)




outliers('ONEOFF_PURCHASES',creditCardData)
No of outliers:  422
Removing outliers____________
Null values:  0
<ipython-input-137-83ef36d41cf4>:15: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

The outliers are not replacing.异常值没有替换。 can anyone help me on this?谁可以帮我这个事? built a function for detecting outliers in features using z score method and trying to fix the issue by replacing the outliers with upper limit.构建了一个 function 用于使用 z score 方法检测特征中的异常值,并尝试通过用上限替换异常值来解决问题。 I couldn't achieve the output for this function.我无法为这个 function 实现 output。 So, could you help me in this?那么,你能帮我解决这个问题吗? and for z score detection threshold for upper limit is 3 and -3 for lower limit.对于 z 分数检测,上限为 3,下限为 -3。 Built on Dataset CreditCardData Help me or guide me through the problem i am facing here!建立在数据集 CreditCardData 帮助我或指导我解决我在这里面临的问题!

def Outliers(col_name, creditCardData):

mean = creditCardData[col_name].mean()
std = creditCardData[col_name].std()

upper = mean + 3 * std
lower = mean - 3 * std

print('Upper bound: ', upper)
print('Lower bound: ', lower, '\n')

no_of_out_up = sum(creditCardData[col_name]>upper)
no_of_out_lo = sum(creditCardData[col_name]<lower)

print('No of outliers above upperbound: ', no_of_out_up)
print('No of outliers below lowerbound: ', no_of_out_lo, '\n')

print('Removing outliers____________\n')
creditCardData[col_name][creditCardData[col_name]>upper] = upper
creditCardData[col_name][creditCardData[col_name]<lower] = lower

no_of_out_up = sum(creditCardData[col_name]>upper)
no_of_out_lo = sum(creditCardData[col_name]<lower)

print('No of outliers above upperbound: ', no_of_out_up)
print('No of outliers below lowerbound: ', no_of_out_lo, '\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM