在 python 的二维数组中搜索一维数组

Question

Say I have a massive 2D Database shaped (1.2mil, 6) .假设我有一个巨大的 2D 数据库形状(1.2mil, 6) 。

I want to find the index of a 1D array (1, 6) in the big_DB .我想在big_DB中找到一维数组(1, 6)的索引。 I actually have 64 of these vectors to search for at a time, shaped (64, 6) .实际上，我一次要搜索 64 个这些向量，形状为(64, 6) 。

Here's my code:这是我的代码：

for data in range(64): # I have 64 1d arrays to find
    self.idx = np.where((big_DB == arrays[data]).all(axis=1))

This takes 0.043 sec (for all 64 arrays).这需要 0.043 秒（对于所有 64 个阵列）。 Is there a faster method to do this?有没有更快的方法来做到这一点？ My project will call the search function over 40,000 times.我的项目将调用搜索 function 超过 40,000 次。

Edit) The big_DB is the result of itertools.product, unique in row, float.编辑）big_DB 是 itertools.product 的结果，行中唯一，浮点数。

Answer 1

The fastest I've been able to get this to work is using O(1) lookup using Python's builtin dict type.我能够让它工作的最快方法是使用 Python 的内置dict类型使用O(1)查找。 You need to pre-process your DB, which may take a second or two at most, but lookups go from >100ms on my machine to <50us: an improvement by 2000x or better for all 64 lookups.您需要预处理您的数据库，这可能最多需要一两秒钟，但是查找 go 从我机器上的 >100ms 到 <50us：对于所有 64 次查找，提高了 2000 倍或更好。 You may get slightly worse results because I tested with a 100k-element database.由于我使用 100k 元素的数据库进行了测试，您可能会得到稍差的结果。 The larger DB you have may cause more hash collisions.您拥有的较大的数据库可能会导致更多的 hash 冲突。

To make the lookup hash-table, I turned each row of big_DB into a bytes object.为了制作查找哈希表，我将big_DB的每一行转换为字节 object。 This makes up the key.这构成了关键。 Values are then indices of each element, since that's how you want to do the lookup:然后值是每个元素的索引，因为这就是您想要进行查找的方式：

dt = f'V{big_DB.shape[1] * big_DB.dtype.itemsize}'
dict_db = dict(zip(map(np.void.item, np.squeeze(big_DB.view(dt))), range(len(big_DB))))

The resulting lookup is as simple as结果查找很简单

idx = dict_db[x.view(dt).item()]

在 python 的二维数组中搜索一维数组

问题描述

1 个解决方案

解决方案1
0 2021-03-25 21:51:30

在 python 的二维数组中搜索一维数组

问题描述

1 个解决方案

解决方案1 0 2021-03-25 21:51:30

解决方案1
0 2021-03-25 21:51:30