Python：匹配2个不同长度数组并在较大数组中查找索引的有效方法

Question

我有2个数组： x和bigx 。 它们的范围相同，但是bigx还有很多要点。 例如

x = np.linspace(0,10,100)
bigx = np.linspace(0,10,1000)

我想在bigx找到x和bigx匹配2个有效数字的索引。 我需要非常快地执行此操作，因为我需要积分每个步骤的索引。

使用numpy.where非常慢：

index_bigx = [np.where(np.around(bigx,2) == i) for i in np.around(x,2)]

使用numpy.in1d快30倍

index_bigx = np.where(np.in1d(np.around(bigx), np.around(x,2) == True)

我还尝试使用zip和enumerate因为我知道这应该更快，但返回的是空的：

>>> index_bigx = [i for i,(v,myv) in enumerate(zip(np.around(bigx,2), np.around(x,2))) if myv == v]
>>> print index_bigx
[]

我想我一定在这里弄糊涂了，我想尽可能地优化它。 有什么建议么？

Answer 1

由于bigx总是均匀分布，因此直接计算索引非常简单：

start = bigx[0]
step = bigx[1] - bigx[0]
indices = ((x - start)/step).round().astype(int)

线性时间，无需搜索。

Answer 2

由于我们将x映射到bigx等距的bigx ，因此可以使用np.searchsorted的合并操作，使用其'left'选项来模拟索引查找操作。 这是实现-

out = np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')

运行时测试

In [879]: import numpy as np
     ...: 
     ...: xlen = 10000
     ...: bigxlen = 70000
     ...: bigx = 100*np.linspace(0,1,bigxlen)
     ...: x = bigx[np.random.permutation(bigxlen)[:xlen]]
     ...: 

In [880]: %timeit np.where(np.in1d(np.around(bigx,2), np.around(x,2)))
     ...: %timeit np.searchsorted(np.around(bigx,2), np.around(x,2),side='left')
     ...: 
100 loops, best of 3: 4.1 ms per loop
1000 loops, best of 3: 1.81 ms per loop

Answer 3

如果只需要元素，这应该起作用：

np.intersect1d(np.around(bigx,2), np.around(x,2))

如果需要索引，请尝试以下操作：

around_x = set(np.around(x,2))
index_bigx = [i for i,b in enumerate(np.around(bigx,2)) if b in around_x]

注意：这些未经测试。

Python：匹配2个不同长度数组并在较大数组中查找索引的有效方法

问题描述

3 个解决方案

解决方案1
1 已采纳 2015-06-04 22:51:13

解决方案2
1 2015-06-04 22:52:28

解决方案3
0 2015-06-04 20:51:27

Python：匹配2个不同长度数组并在较大数组中查找索引的有效方法

问题描述

3 个解决方案

解决方案1 1 已采纳 2015-06-04 22:51:13

解决方案2 1 2015-06-04 22:52:28

解决方案3 0 2015-06-04 20:51:27

解决方案1
1 已采纳 2015-06-04 22:51:13

解决方案2
1 2015-06-04 22:52:28

解决方案3
0 2015-06-04 20:51:27