[英]Comparing 2D boolean arrays
I am working on a problem where I need to compare 1 particular array to hundreds of thousands of others and return a list of results showing how similar they are to each other, I read up that numpy was probably the best library to go about working with arrays (if there's anything better please let me know:) so I scribbled this, but it's still slow.我正在解决一个问题,我需要将 1 个特定数组与数十万其他数组进行比较,并返回一个结果列表,显示它们彼此之间的相似程度,我读到 numpy 可能是 go 关于使用的最佳库arrays(如果有更好的请告诉我:)所以我写了这个,但它仍然很慢。 I am not the best at programming so any help to improve this would be immensely appreciated!我不是最擅长编程的,所以任何有助于改进这一点的帮助将不胜感激!
import numpy as np
list_of_arrays = [np.random.randint(0, 2, (30, 30)) for array in range(100000)]
base_array = np.random.randint(0, 2, (30, 30))
results = []
for array in list_of_arrays:
results.append(np.sum(np.equal(base_array, array)))
You can use numpy broadcasting magic to do it in one list without list comprehension or loops of any kind:您可以使用 numpy 广播魔术在一个列表中执行此操作,而无需列表理解或任何类型的循环:
results = np.equal(base_array, list_of_arrays).sum(axis=1).sum(axis=1)
You have so many arrays that it can't get much faster;)你有这么多 arrays,它不能变得更快;)
There are a number of efficient tricks for doing this in numpy.在 numpy 中有许多有效的技巧可以做到这一点。 None of them require explicit loops or appending to a list.它们都不需要显式循环或附加到列表。
First, make the list into an array:首先,将列表变成一个数组:
list_of_arrays = np.random.randint(0, 2, (100000, 30, 30), dtype=bool)
Notice how much simpler (and faster) that is.请注意这是多么简单(和更快)。 Now make a boolean base:现在制作一个 boolean 底座:
base_array = np.random.randint(0, 2, (30, 30), dtype=bool)
The simplest comparison makes direct use of broadcasting:最简单的比较直接使用广播:
results = (base_array == list_of_arrays).sum((1, 2))
The equality of two booleans can also be obtained from their XOR:两个布尔值的相等性也可以从它们的 XOR 中获得:
results = (~base_array ^ list_of_arrays).sum((1, 2))
Running ~
on base_array
is much faster than doing it on list_of_arrays
or the result of the XOR and has the same logical effect.在base_array
上运行~
比在list_of_arrays
或 XOR 的结果上运行要快得多,并且具有相同的逻辑效果。
You can simplify the sum by raveling the last dimensions:您可以通过分解最后一个维度来简化总和:
results = (base_array.ravel() == list_of_arrays.reshape(100000, -1)).sum(-1)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.