[英]outlier detection and replacing them in complete dataframe
def outliers(column, creditCardData):
creditCardData[column].describe()
zscore = (creditCardData[column] -
creditCardData[column].mean())/creditCardData[column].std()
no_of_out = sum(zscore > 3)
print('No of outliers: ', no_of_out)
upper_f = creditCardData[column].mean() + 3*creditCardData[column].std()
lower_f = creditCardData[column].mean() - 3*creditCardData[column].std()
no_of_out_up = sum(creditCardData[column]>upper_f)
no_of_out_lo = sum(creditCardData[column]<lower_f)
print('Removing outliers____________')
creditCardData[column][creditCardData[column]>upper_f] = upper_f
creditCardData[column][creditCardData[column]<lower_f] = lower_f
no_of_out_up = sum(creditCardData[column]>upper)
no_of_out_lo = sum(creditCardData[column]<lower)
print('Null values: ', creditCardData[column].isnull().sum())
outliers('PURCHASES', creditCardData)
outliers('ONEOFF_PURCHASES',creditCardData)
No of outliers: 422
Removing outliers____________
Null values: 0
<ipython-input-137-83ef36d41cf4>:15: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
異常值沒有替換。 誰可以幫我這個事? 構建了一個 function 用於使用 z score 方法檢測特征中的異常值,並嘗試通過用上限替換異常值來解決問題。 我無法為這個 function 實現 output。 那么,你能幫我解決這個問題嗎? 對於 z 分數檢測,上限為 3,下限為 -3。 建立在數據集 CreditCardData 幫助我或指導我解決我在這里面臨的問題!
def Outliers(col_name, creditCardData):
mean = creditCardData[col_name].mean()
std = creditCardData[col_name].std()
upper = mean + 3 * std
lower = mean - 3 * std
print('Upper bound: ', upper)
print('Lower bound: ', lower, '\n')
no_of_out_up = sum(creditCardData[col_name]>upper)
no_of_out_lo = sum(creditCardData[col_name]<lower)
print('No of outliers above upperbound: ', no_of_out_up)
print('No of outliers below lowerbound: ', no_of_out_lo, '\n')
print('Removing outliers____________\n')
creditCardData[col_name][creditCardData[col_name]>upper] = upper
creditCardData[col_name][creditCardData[col_name]<lower] = lower
no_of_out_up = sum(creditCardData[col_name]>upper)
no_of_out_lo = sum(creditCardData[col_name]<lower)
print('No of outliers above upperbound: ', no_of_out_up)
print('No of outliers below lowerbound: ', no_of_out_lo, '\n')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.