如何用在另一个数组中找到的值的索引替换 Python NumPy 数组中的值？

Question

I have an n*m array "a", and another 1D array "b", such as the following:我有一个 n*m 数组“a”和另一个一维数组“b”，如下所示：

a = array([[ 51, 30, 20, 10],
           [ 10, 32, 65, 77],
           [ 15, 20, 77, 30]])

b = array([10, 15, 20, 30, 32, 51, 65, 77])

I would like to replace all elements in "a" with the corresponding index of "b" where that element lies.我想用该元素所在的“b”的相应索引替换“a”中的所有元素。 In the case above, I would like the output to be:在上述情况下，我希望 output 为：

a = array([[ 5, 3, 2, 0],
           [ 0, 4, 6, 7],
           [ 1, 2, 7, 3]])

Please note, in real application my arrays are large, over 30k elements and several thousands of them.请注意，在实际应用中，我的 arrays 很大，超过 30k 个元素和数千个元素。 I have tried for loops but these take a long time to compute.我尝试过 for 循环，但这些循环需要很长时间来计算。 I have also tried similar iterative methods, and using list.index() to grab the indices but this also takes too much time.我也尝试过类似的迭代方法，并使用 list.index() 来获取索引，但这也需要太多时间。

Can anyone help me in identifying first the indices of "b" for the elements of "a" which appear in "b", and then constructing the updated "a" array?谁能帮我首先确定出现在“b”中的“a”元素的“b”索引，然后构造更新的“a”数组？

Thank you.谢谢你。

Answer 1

If the minimal/maximal elements of a, b form a small range (or at least small enough to fit into RAM), this can be done very quickly using a lookup table:如果a, b的最小/最大元素形成一个小范围（或至少小到足以放入 RAM），则可以使用查找表非常快速地完成此操作：

a = np.array([[51, 30, 20, 10],
              [10, 32, 65, 77],
              [15, 20, 77, 30]])
b = np.array([10, 15, 20, 30, 32, 51, 65, 77])

lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))

Then:然后：

>>> a_indices = lut[a - lo]
>>> a_indices
array([[5, 3, 2, 0],
       [0, 4, 6, 7],
       [1, 2, 7, 3]])

Answer 2

This is posted as an answer only because it is too long for a comment.这只是作为答案发布，因为评论太长了。 It supports orlp 's solution posted above.它支持上面发布的orlp的解决方案。 Numpy's vectorize avoids an explicit loop, but it is clearly not the best approach. Numpy 的向量化避免了显式循环，但它显然不是最好的方法。 Note that Numpy's searchsorted can only be applied as shown when b is sorted.请注意，Numpy 的 searchsorted 只能在 b 排序时应用，如图所示。

import timeit
import numpy as np

a = np.random.randint(1,100,(1000,1000))
b = np.arange(0,1000,1)

def o1():
    lo = min(a.min(), b.min())
    hi = max(a.max(), b.max())
    lut = np.zeros(hi - lo + 1, dtype=np.int64)
    lut[b - lo] = np.arange(len(b))
    a2 = lut[a - lo]
    return a2 

def o2():
    a2 = a.copy()
    fu = np.vectorize(lambda i: np.place(a2, a2==b[i], i))
    fu(np.arange(0,len(b),1))

print(timeit.timeit("np.searchsorted(b, a)", globals=globals(), number=2))
print(timeit.timeit("o1()", globals=globals(), number=2))
print(timeit.timeit("o2()", globals=globals(), number=2))

prints印刷

0.061956800000189105
0.012765400000716909
2.220097600000372

如何用在另一个数组中找到的值的索引替换 Python NumPy 数组中的值？

问题描述

2 个解决方案

解决方案1
1 2020-12-29 02:18:52

解决方案2
0 已采纳 2020-12-29 05:47:09

如何用在另一个数组中找到的值的索引替换 Python NumPy 数组中的值？

问题描述

2 个解决方案

解决方案1 1 2020-12-29 02:18:52

解决方案2 0 已采纳 2020-12-29 05:47:09

解决方案1
1 2020-12-29 02:18:52

解决方案2
0 已采纳 2020-12-29 05:47:09