简体   繁体   English

用N个相邻值的平均值替换大于阈值的Numpy数组的所有元素

[英]Replace all elements of Numpy array greater than threshold with average of X adjacent values

I have this Numpy array that contains a data set 我有这个包含数据集的Numpy数组

array = np.array([3147, 3228, 3351, 3789, 4562, 4987, 5688, 6465, 7012, 7560, 7976, 8615, 8698, 8853, 8783, 8949, 9066, 9123, 9172, 9411, 9717, 9696, 9848,10113, 10154, 10227, 10439, 10672, 10287, 10386, 10417, 10585, 10607,10461, 10654, 10739, 10634, 10490, 10544, 10645, 10392, 10330, 10044, 9560, 8711, 8152, 7506, 7191, 6994, 6601, 6609, 6670, 7293, 32767 , 7264, 7262, 7503 ,7872, 7826, 8037]) array = np.array([3147,3228,3351,3789,4562,4987,5688,6465,7012,7560,7976,8615,8698,8853,8783,8949,9066,9123,9172,9411,9717,9696] ,9848,10113,10154,10227,10439,10672,10287,10386,10417,10585,10607,10461,10654,10739,10634,10490,10544,10645,10392,10330,10044,9560,8711,8152,7506 ,7191,6994,6601,6609,6670,7293,32767,7264,7262,7503,7872,7826,8037])

When plotted, it gives a smooth distribution but spikes with the outlier value of 32767 . 绘制时,它给出了平滑的分布,但是异常值为32767 Currently I have this which sets any pixel greater than a threshold value of 16384 to zero. 目前,我将其设置为大于阈值16384任何像素为零。

array[array > 16384] = 0

How can I change this so that the replacement value is the averaged of the X left and right values if the pixel is above the threshold value? 如果像素高于阈值,如何更改此值以使替换值是X左右值的平均值? If the outlier point is at the very first index or the very last index then the averaged value should just be from the side with values. 如果异常点位于第一个索引或最后一个索引处,那么平均值应该只是来自值的一侧。 There could also be multiple values greater than the threshold value (in this example there was only one) 也可能有多个值大于阈值(在此示例中只有一个)

The expected output with the example input that uses 2 adjacent right and left values would be calculated like (6670 + 7293 + 7264 + 7262)/4 = 7122.25 to get this result 使用2个相邻左右值的示例输入的预期输出将被计算为(6670 + 7293 + 7264 + 7262)/4 = 7122.25以获得此结果

array = np.array([3147, 3228, 3351, 3789, 4562, 4987, 5688, 6465, 7012, 7560, 7976, 8615, 8698, 8853, 8783, 8949, 9066, 9123, 9172, 9411, 9717, 9696, 9848,10113, 10154, 10227, 10439, 10672, 10287, 10386, 10417,10585, 10607,10461, 10654, 10739, 10634, 10490, 10544, 10645, 10392, 10330, 10044, 9560, 8711, 8152, 7506, 7191, 6994, 6601, 6609, 6670, 7293, 7122 , 7264, 7262, 7503 ,7872, 7826, 8037]) array = np.array([3147,3228,3351,3789,4562,4987,5688,6465,7012,7560,7976,8615,8698,8853,8783,8949,9066,9123,9172,9411,9717,9696] ,9848,10113,10154,10227,10439,10672,10287,10386,10417,10585,10607,10461,10654,10739,10634,10490,10544,10645,10392,10330,10044,9560,8711,8152,7506 ,7191,6994,6601,6609,6670,7293,7122,7264,7262,7503,7872,7826,8037])

Thanks! 谢谢!

This would work 这会奏效

def remove_outlier_pixels(array, adjacent=2):
    outliers = np.argwhere(array > 16384)
    for outlier in outliers:
        outlier = int(outlier)
        left = array[outlier-adjacent:outlier]
        right = array[outlier+1:outlier+adjacent+1]
        array[outlier] = (left.sum() + right.sum())/(left.size + right.size)
    return array

Averages out all pixels greater than the threshold with X right and left adjacent values. 使用X右和左相邻值平均所有大于阈值的像素。 Also takes care of the corner case if the higher threshold value was at the first or last index 如果较高的阈值位于第一个或最后一个索引处,也会处理拐角情况

Using this input 使用此输入

[ 99999 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 37000 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 32767 7264 7262 7503 7872 7826 88888 ] [ 99999 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 37000 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 32767 7264 7262 7503 7872 7826 88888 ]

We get 我们得到了

[ 3289 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 9149 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 7122 7264 7262 7503 7872 7826 7849 ] [ 3289 3228 3351 3789 4562 4987 5688 6465 7012 7560 7976 8615 8698 8853 8783 8949 9066 9149 9172 9411 9717 9696 9848 10113 10154 10227 10439 10672 10287 10386 10417 10585 10607 10461 10654 10739 10634 10490 10544 10645 10392 10330 10044 9560 8711 8152 7506 7191 6994 6601 6609 6670 7293 7122 7264 7262 7503 7872 7826 7849 ]

You can do: 你可以做:

X = 2 #set number of adjacent values
calc_avg = lambda x: (sum([array[x+a]+array[x-a] for a in range(1, X+1)]))/4
array[array > 16384] = [calc_avg(x[0]) for x in np.where(array > 16384)]

This may run into issues though if you're cut off value does not have 2 numbers before/after it! 这可能会遇到问题但是如果你被切断了价值之前/之后没有2个数字!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM