[英]Remove outliers from 3d data elements
I have written a function that removes outliers from a dataset.我写了一个 function 从数据集中删除异常值。 It works using the z-score and it works for elements of 1d, for example;例如,它使用 z-score 并且适用于 1d 的元素;
# usage remove_outliers(data)
[10 99 12 15 9 2 17 15]---->[10 12 15 9 17 15]
However it is wrong for 3d data, it pulls apart my 3d data, for example;但是,3d 数据是错误的,例如,它会分解我的 3d 数据;
# usage remove_outliers(data, thresh=(30,30,30), axis=(0,1))
[(0, 10, 3) (99, 255, 255) (100, 10, 9) (45, 34, 9)]---->[ 0 10 3 99 255 255 100 10 9 45 34 9]
I am expecting the result something like;我期待结果是这样的;
[(0, 10, 3) (100, 10, 9) (45, 34, 9)]
What am I doing wrong in my function remove_outliers()
and how can I edit it to handle 3d element data?我在 function remove_outliers()
中做错了什么,如何编辑它以处理 3d 元素数据?
def remove_outliers(data, thresh=2.0, axis=None):
# If a value is > thresh std_deviations from the mean they are an outlier and remove it
# Eg, thresh = 3, std_dev = 2, mean=18. If value=7, then 7 is an outlier
d = np.abs(data - np.median(data, axis))
mdev = np.median(d, axis)
s = d/mdev if mdev else 0.0
return data[s<thresh]
You need to combine the coordinatewise condition for each point.您需要结合每个点的坐标条件。 In the code below this is done by .all(axis=1)
在下面的代码中,这是由.all(axis=1)
完成的
# numpy.median is rather slow, let's build our own instead
def median(x):
m,n = x.shape
middle = np.arange((m-1)>>1,(m>>1)+1)
x = np.partition(x,middle,axis=0)
return x[middle].mean(axis=0)
# main function
def remove_outliers(data,thresh=2.0):
m = median(data)
s = np.abs(data-m)
return data[(s<median(s)*thresh).all(axis=1)]
# small test
remove_outliers(np.array([(0, 10, 3), (99, 255, 255), (100, 10, 9), (45, 34, 9)]))
# array([[100, 10, 9],
# [ 45, 34, 9]])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.