简体   繁体   English

用非数据集中的值替换numpy数组中的所有非唯一值

[英]Replace all non-unique values in numpy array by value not in dataset

I am dealing with large (masked) 2D numpy arrays which originate from country wide raster datasets of 10 to 200 meter resolutions. 我正在处理大型(蒙版)2D numpy数组,这些数组源自分辨率为10至200米的全国范围的栅格数据集。 The arrays are very large and can contain several millions of values. 数组非常大,可以包含数百万个值。

I would like to perform the following operation on these kinds of arrays in the most efficient way possible: 我想以最有效的方式对这些类型的数组执行以下操作:

in_array = numpy.array([[1,2,2],[4,4,6]])
out_array = uniqify(in_array)
print(out_array)
>>>
numpy.array([[1,2,3],[4,5,6]])

or some other combination of numbers. 或其他数字组合。 It really does not matter the value, what I care about is that there are NO duplicate values throughout the array. 值的大小并不重要,我关心的是整个数组中没有重复的值。 Each cell value must be unique, and the magnitude of the cell must not matter. 每个像元值必须是唯一的,并且像元的大小不得紧要。

This is one way of doing it, but I worry that it may seriously break down for large datasets: 这是一种方法,但是我担心对于大型数据集,它可能会严重崩溃:

def uniqify(array):
    count = 0
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            array[i][j]= count
            count = count+1
    return array


array = np.array([[100,2,3],[4,5,5,],[4,8,7]])
uniqified = uniqify(array)
print(uniqified)

I wonder if there are any ready-made, computationally efficient methods to do this without using nested for loops? 我想知道是否有不使用嵌套的for循环的现成的,计算有效的方法?

Thanks 谢谢

你可以用一个简单的

  out_array = np.arange(in_array.size).reshape(in_array.shape)

You can modify an array in place by selecting all values via [:] : 您可以通过[:]选择所有值来就地修改数组:

A = np.array([[1,2,2],[4,4,6]])

A[:] = np.arange(A.size).reshape(A.shape)

array([[0, 1, 2],
       [3, 4, 5]])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM