简体   繁体   English

比较二维 boolean arrays

[英]Comparing 2D boolean arrays

I am working on a problem where I need to compare 1 particular array to hundreds of thousands of others and return a list of results showing how similar they are to each other, I read up that numpy was probably the best library to go about working with arrays (if there's anything better please let me know:) so I scribbled this, but it's still slow.我正在解决一个问题,我需要将 1 个特定数组与数十万其他数组进行比较,并返回一个结果列表,显示它们彼此之间的相似程度,我读到 numpy 可能是 go 关于使用的最佳库arrays(如果有更好的请告诉我:)所以我写了这个,但它仍然很慢。 I am not the best at programming so any help to improve this would be immensely appreciated!我不是最擅长编程的,所以任何有助于改进这一点的帮助将不胜感激!

import numpy as np

list_of_arrays = [np.random.randint(0, 2, (30, 30)) for array in range(100000)]
base_array = np.random.randint(0, 2, (30, 30))
results = []

for array in list_of_arrays:
    results.append(np.sum(np.equal(base_array, array)))

You can use numpy broadcasting magic to do it in one list without list comprehension or loops of any kind:您可以使用 numpy 广播魔术在一个列表中执行此操作,而无需列表理解或任何类型的循环:

results = np.equal(base_array, list_of_arrays).sum(axis=1).sum(axis=1)

You have so many arrays that it can't get much faster;)你有这么多 arrays,它不能变得更快;)

There are a number of efficient tricks for doing this in numpy.在 numpy 中有许多有效的技巧可以做到这一点。 None of them require explicit loops or appending to a list.它们都不需要显式循环或附加到列表。

First, make the list into an array:首先,将列表变成一个数组:

list_of_arrays = np.random.randint(0, 2, (100000, 30, 30), dtype=bool)

Notice how much simpler (and faster) that is.请注意这是多么简单(和更快)。 Now make a boolean base:现在制作一个 boolean 底座:

base_array = np.random.randint(0, 2, (30, 30), dtype=bool)

The simplest comparison makes direct use of broadcasting:最简单的比较直接使用广播:

results = (base_array == list_of_arrays).sum((1, 2))

The equality of two booleans can also be obtained from their XOR:两个布尔值的相等性也可以从它们的 XOR 中获得:

results = (~base_array ^ list_of_arrays).sum((1, 2))

Running ~ on base_array is much faster than doing it on list_of_arrays or the result of the XOR and has the same logical effect.base_array上运行~比在list_of_arrays或 XOR 的结果上运行要快得多,并且具有相同的逻辑效果。

You can simplify the sum by raveling the last dimensions:您可以通过分解最后一个维度来简化总和:

results = (base_array.ravel() == list_of_arrays.reshape(100000, -1)).sum(-1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM