如何使用python查找給定數據集中的離群值

Question

坐標= [（259，168），（62，133），（143，163），（174，270），（321，385）]

斜率= 0.76083799
截距= 77.87127406

對於我來說，帶有棕色標記的坐標是一個潛在的異常值，因此需要刪除。 到目前為止，我正在嘗試使用學生殘差和折刀殘差來消除這些異常值。 但是，鑒於我擁有的數據集，我無法計算這些殘差。

如果您的人員可以幫助我在上述數據集中查找殘差以及如何做到這一點，將非常有幫助。

碼

import numpy as np
import matplotlib.pyplot as plt

coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]

x=[x1[0] for x1 in coordinates]
y=[x1[1] for x1 in coordinates]

for x1,y1 in coordinates:
   plt.plot(x1,y1,marker="o",color="brown")
plt.show()

# using numpy polyfit method to find regression line slope and intercept 
z = np.polyfit(x,y,1)
print(z)
slope = z[0]
intercept =z[1]

newx = np.linspace(62,321,200)
newy = np.poly1d(z)
plt.plot(x,y, 'o', newx, newy(newx),color="black")
# plt.plot()
plt.plot(259,168,marker="o",color="brown")
plt.show()

#TODO
#remove the outliers and then display

Answer 1

x和y開頭放置在np.ndarrays中。

輸入：

import numpy as np
import matplotlib.pyplot as plt

coordinates = [(259, 168), (62, 133), (143, 163), (174, 270), (321, 385)]

x=np.array([x1[0] for x1 in coordinates]) #Placed into array
y=np.array([x1[1] for x1 in coordinates]) #Placed into array

for x1,y1 in coordinates:
   plt.plot(x1,y1,marker="o",color="brown")
plt.show()

# using numpy polyfit method to find regression line slope and intercept 
z = np.polyfit(x,y,1)
print(z)
slope = z[0]
intercept =z[1]

newx = np.linspace(62,321,200)
newy = np.poly1d(z)
plt.plot(x,y, 'o', newx, newy(newx),color="black")
# plt.plot()
plt.plot(259,168,marker="o",color="brown")
plt.show()

附加代碼：

print("old y: " + repr(y)) #Display original array of y values
print("old x: " + repr(x)) 
residual_array = abs(y - (intercept + slope * x)) #Create an array of residuals
max_accept_deviation = 100 #An arbitrary value of "acceptable deviation"
mask = residual_array >= max_accept_deviation #Create an array of TRUE/FALSE values. TRUE where residual array is larger than deviation
rows_to_del = tuple(te for te in np.where(mask)[0]) #np.where converts the mask to a list of row numbers which is converted to a tuple
cleaned_y = np.delete(y,rows_to_del) #np.delete deletes all row numbers in the earlier tuple
cleaned_x = np.delete(x,rows_to_del)
print("new y: " + repr(cleaned_y)) #Print the cleaned values
print("new x: " + repr(cleaned_x))

輸出：

[  0.76083799  77.87127406]
old y: array([168, 133, 163, 270, 385])
old x: array([259,  62, 143, 174, 321])
new y: array([133, 163, 270, 385])
new x: array([ 62, 143, 174, 321])

如何使用python查找給定數據集中的離群值

問題描述

1 個解決方案

解決方案1
1 2016-08-10 15:21:53

如何使用python查找給定數據集中的離群值

問題描述

1 個解決方案

解決方案1 1 2016-08-10 15:21:53

解決方案1
1 2016-08-10 15:21:53