[英]How can I replace values in a Python NumPy array with the index of those values found in another array?
I have an n*m array "a", and another 1D array "b", such as the following:我有一个 n*m 数组“a”和另一个一维数组“b”,如下所示:
a = array([[ 51, 30, 20, 10],
[ 10, 32, 65, 77],
[ 15, 20, 77, 30]])
b = array([10, 15, 20, 30, 32, 51, 65, 77])
I would like to replace all elements in "a" with the corresponding index of "b" where that element lies.我想用该元素所在的“b”的相应索引替换“a”中的所有元素。 In the case above, I would like the output to be:
在上述情况下,我希望 output 为:
a = array([[ 5, 3, 2, 0],
[ 0, 4, 6, 7],
[ 1, 2, 7, 3]])
Please note, in real application my arrays are large, over 30k elements and several thousands of them.请注意,在实际应用中,我的 arrays 很大,超过 30k 个元素和数千个元素。 I have tried for loops but these take a long time to compute.
我尝试过 for 循环,但这些循环需要很长时间来计算。 I have also tried similar iterative methods, and using list.index() to grab the indices but this also takes too much time.
我也尝试过类似的迭代方法,并使用 list.index() 来获取索引,但这也需要太多时间。
Can anyone help me in identifying first the indices of "b" for the elements of "a" which appear in "b", and then constructing the updated "a" array?谁能帮我首先确定出现在“b”中的“a”元素的“b”索引,然后构造更新的“a”数组?
Thank you.谢谢你。
If the minimal/maximal elements of a, b
form a small range (or at least small enough to fit into RAM), this can be done very quickly using a lookup table:如果
a, b
的最小/最大元素形成一个小范围(或至少小到足以放入 RAM),则可以使用查找表非常快速地完成此操作:
a = np.array([[51, 30, 20, 10],
[10, 32, 65, 77],
[15, 20, 77, 30]])
b = np.array([10, 15, 20, 30, 32, 51, 65, 77])
lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))
Then:然后:
>>> a_indices = lut[a - lo]
>>> a_indices
array([[5, 3, 2, 0],
[0, 4, 6, 7],
[1, 2, 7, 3]])
This is posted as an answer only because it is too long for a comment.这只是作为答案发布,因为评论太长了。 It supports orlp 's solution posted above.
它支持上面发布的orlp的解决方案。 Numpy's vectorize avoids an explicit loop, but it is clearly not the best approach.
Numpy 的向量化避免了显式循环,但它显然不是最好的方法。 Note that Numpy's searchsorted can only be applied as shown when b is sorted.
请注意,Numpy 的 searchsorted 只能在 b 排序时应用,如图所示。
import timeit
import numpy as np
a = np.random.randint(1,100,(1000,1000))
b = np.arange(0,1000,1)
def o1():
lo = min(a.min(), b.min())
hi = max(a.max(), b.max())
lut = np.zeros(hi - lo + 1, dtype=np.int64)
lut[b - lo] = np.arange(len(b))
a2 = lut[a - lo]
return a2
def o2():
a2 = a.copy()
fu = np.vectorize(lambda i: np.place(a2, a2==b[i], i))
fu(np.arange(0,len(b),1))
print(timeit.timeit("np.searchsorted(b, a)", globals=globals(), number=2))
print(timeit.timeit("o1()", globals=globals(), number=2))
print(timeit.timeit("o2()", globals=globals(), number=2))
prints印刷
0.061956800000189105
0.012765400000716909
2.220097600000372
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.