在NumPy中使用For循环进行逐行比较缓慢-如何改进？

Question

我正在使用python和numpy来比较两个数组或具有相等坐标（x，y，z）的相等形状，以使其匹配，如下所示：

coordsCFS
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.03      ,  0.02      ,  0.        ],
       [ 0.02      ,  0.025     ,  0.        ],
        ..., 
       [ 0.02958333,  0.029375  ,  0.        ],
       [ 0.02958333,  0.0290625 ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

和

coordsRMED
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.02083333,  0.02      ,  0.        ],
       [ 0.02083333,  0.020625  ,  0.        ],
       ..., 
       [ 0.03      ,  0.0296875 ,  0.        ],
       [ 0.02958333,  0.03      ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

从具有h5py的两个hdf5文件中读取数据。 为了进行比较，我使用allclose ，它测试“几乎相等”。 坐标在python的常规浮点精度范围内不匹配。 这就是我使用for循环的原因，否则它将与numpy.where一起使用。 我通常尝试避免for循环，但是在这种情况下，我不知道怎么做。 因此，我想到了一个令人惊讶的缓慢片段：

mapList = []
for cfsXYZ in coordsCFS:
    # print cfsXYZ
    indexMatch = 0
    match = []
    for asterXYZ in coordRMED:
        if numpy.allclose(asterXYZ,cfsXYZ):
            match.append(indexMatch)
            # print "Found match at index " + str(indexMatch)
            # print asterXYZ
        indexMatch += 1

    # check: must only find one match. 
    if len(match) != 1:
        print "ERROR matching"
        print match
        print cfsXYZ
        return 1

    # save to list
    mapList.append(match[0])

if len(mapList) != coordsRMED.shape[0]:
    print "ERROR: matching consistency check"
    print mapList
    return 1

对于我的测试样本大小（800行），这非常慢。 我计划比较更大的集合。 我可以删除一致性检查，并在内部for循环中使用break以提高速度。 还有更好的方法吗？

Answer 1

一种解决方案是对两个数组都进行排序（添加索引列，以便排序后的数组仍包含原始索引）。 然后，为了匹配，以锁定步骤逐步遍历数组。 因为您期望精确的1-1对应，所以您应该始终能够匹配成对的行。

Answer 2

首先要记住的是，默认情况下，在NumPy中，“迭代总是以C样式的连续方式进行（最后一个索引变化最快）” [1]。 你可能会通过逆向迭代的顺序改善的事情（上迭代coordMED.T ，的转置coordMED ...）

尽管如此，我仍然对您需要一个循环感到惊讶：您声称“坐标在python的常规浮点精度内不匹配”：您是否尝试过调整rtol和atol参数， np.allclose描述doc ？

[1]

Answer 3

您可以使用以下方法摆脱内部循环：

for cfsXYZ in coordsCFS:
    match = numpy.nonzero(
        numpy.max(numpy.abs(coordRMED - cfsXYZ), axis=1) < TOLERANCE)

在NumPy中使用For循环进行逐行比较缓慢-如何改进？

问题描述

3 个解决方案

解决方案1
1

解决方案2
1 2012-10-05 08:15:34

解决方案3
1 已采纳 2012-10-05 08:39:35

在NumPy中使用For循环进行逐行比较缓慢-如何改进？

问题描述

3 个解决方案

解决方案1 1

解决方案2 1 2012-10-05 08:15:34

解决方案3 1 已采纳 2012-10-05 08:39:35

解决方案1
1

解决方案2
1 2012-10-05 08:15:34

解决方案3
1 已采纳 2012-10-05 08:39:35