简体   繁体   English

逐行比较二维数组

[英]Compare 2D arrays row-wise

This problem is resulting from the spatial analysis of unstructured grids in 3D.这个问题是由 3D 中非结构化网格的空间分析引起的。 I have 2 2D arrays to compare, each with 3 columns for xyz coordinates.我有 2 个 2D 数组要比较,每个数组有 3 列用于 xyz 坐标。 One of the array is a reference, the other is evaluated against it (it is the result of CKde tree query against the reference array).一个数组是一个引用,另一个是针对它进行评估的(它是针对引用数组的 CKde 树查询的结果)。 In the end I want the number of matching row of the reference.最后我想要参考的匹配行数。 I have tried to find an array concatenation solution but I am lost in the different dimensions我试图找到一个数组连接解决方​​案,但我在不同的维度中迷失了方向

reference=np.array([[0,1,33],[0,33,36],[0,2,36],[1, 33, 34]])
query= np.array([[0,1,33],[0,1,33],[1, 33, 34],[0,33,36],[0,33,36],[0,1,33],[0,33,36]])

Something in the style is where I am heading风格中的某些东西是我要去的地方

filter=reference[:,:,None]==query.all(axis=0)
result = filter.sum(axis=1)

but I cannot find the right way of broadcasting to be able to compare the rows of the 2 arrays.但我找不到正确的广播方式来比较 2 个数组的行。 The result should be:结果应该是:

np.array([3,3,0,1])

You need to broadcast the two arrays.您需要广播这两个数组。 Since you cannot compare the 1D array directly, you first need to do a reduction using all on the last dimension.由于您无法直接比较一维数组,因此您首先需要在最后一个维度上使用all进行归约。 Then you can count the matched rows with sum sum .然后你可以用 sum sum计算匹配的行。 Here is the resulting code:这是结果代码:

(reference[None,:,:] == query[:,None,:]).all(axis=2).sum(axis=0)

That being said, this solution is not the most efficient for bigger arrays.话虽如此,这个解决方案对于更大的阵列并不是最有效的。 Indeed for m rows for size n in reference and k rows in query , the complexity of the solution is O(nmk) while the optimal solution is O(nm + nk) .实际上,对于reference大小为n m行和query k行,解决方案的复杂性是O(nmk)而最佳解决方案是O(nm + nk) This can be achieved using hash maps (aka dict ).这可以使用哈希映射(又名dict )来实现。 The idea is to put rows of reference array in a hash map with associated values set to 0 and then for each value of query increase the value of the hash map with the key set to the row of query .这个想法是将reference数组的行放在一个散列映射中,关联值设置为 0,然后对于query每个值增加散列映射的值,并将键设置为query的行。 One just need to iterate over the hash map to get the final array.只需遍历哈希映射即可获得最终数组。 Hash map accesses are done in (amortized) constant time.哈希映射访问是在(摊销的)恒定时间内完成的。 Unfortunately, Python dict does not support array as key since array cannot be hashed, but tuples can be.不幸的是,Python dict 不支持数组作为键,因为数组不能被散列,但元组可以。 Here is an example:下面是一个例子:

counts = {tuple(row):0 for row in reference}

for row in query:
    key = tuple(row)
    if key in counts:
        counts[key] += 1

print(list(counts.values()))

Which results in printing: [3, 3, 0, 1] .这导致打印: [3, 3, 0, 1]

Note that the order is often not conserved in hash maps, but it should be ok for Python dict.请注意,哈希映射中的顺序通常不保守,但对于 Python dict 来说应该没问题。 Alternatively, one can use another hash map to rebuild the final array.或者,可以使用另一个哈希映射来重建最终数组。

The resulting solution may be slower for small arrays, but it should be better for huge ones.对于小型阵列,最终的解决方案可能会更慢,但对于大型阵列应该会更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM