簡體   English   中英

從數值特征中去除異常值

[英]removing outliers from numerical features

嗨,我正在嘗試從具有數字特征的列中刪除異常值,但是當我執行我的代碼時,整個數據集都被刪除了,任何人都可以告訴我我做錯了什么嗎

numerical_columns = data.select_dtypes(include=['int64','float64']).columns.tolist()

print('Number of rows before discarding outlier = %d' % (data.shape[0]))

for i in numerical_columns:

q1 = data[i].quantile(0.25)
q3 = data[i].quantile(0.75)
iqr = q3-q1 #Interquartile range
fence_low  = q1-1.5*iqr
fence_high = q3+1.5*iqr
data = data.loc[(data[i] > fence_low) & (data[i] < fence_high)]

print('Number of rows after discarding outlier = %d' % (data.shape[0]))

下面的代碼對我有用。 這里的 col 是 dataframe 需要去除異常值的數值列

    #Remove Outliers: keep only the ones that are within +3 to -3 
    # standard deviations in the column   
     df = df[np.abs(df[col]-df[col].mean()) <= (3*df[col].std())]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM