在NumPy中使用For循環進行逐行比較緩慢-如何改進？

Question

我正在使用python和numpy來比較兩個數組或具有相等坐標（x，y，z）的相等形狀，以使其匹配，如下所示：

coordsCFS
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.03      ,  0.02      ,  0.        ],
       [ 0.02      ,  0.025     ,  0.        ],
        ..., 
       [ 0.02958333,  0.029375  ,  0.        ],
       [ 0.02958333,  0.0290625 ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

和

coordsRMED
array([[ 0.02      ,  0.02      ,  0.        ],
       [ 0.02083333,  0.02      ,  0.        ],
       [ 0.02083333,  0.020625  ,  0.        ],
       ..., 
       [ 0.03      ,  0.0296875 ,  0.        ],
       [ 0.02958333,  0.03      ,  0.        ],
       [ 0.02958333,  0.0296875 ,  0.        ]])

從具有h5py的兩個hdf5文件中讀取數據。 為了進行比較，我使用allclose ，它測試“幾乎相等”。 坐標在python的常規浮點精度范圍內不匹配。 這就是我使用for循環的原因，否則它將與numpy.where一起使用。 我通常嘗試避免for循環，但是在這種情況下，我不知道怎么做。 因此，我想到了一個令人驚訝的緩慢片段：

mapList = []
for cfsXYZ in coordsCFS:
    # print cfsXYZ
    indexMatch = 0
    match = []
    for asterXYZ in coordRMED:
        if numpy.allclose(asterXYZ,cfsXYZ):
            match.append(indexMatch)
            # print "Found match at index " + str(indexMatch)
            # print asterXYZ
        indexMatch += 1

    # check: must only find one match. 
    if len(match) != 1:
        print "ERROR matching"
        print match
        print cfsXYZ
        return 1

    # save to list
    mapList.append(match[0])

if len(mapList) != coordsRMED.shape[0]:
    print "ERROR: matching consistency check"
    print mapList
    return 1

對於我的測試樣本大小（800行），這非常慢。 我計划比較更大的集合。 我可以刪除一致性檢查，並在內部for循環中使用break以提高速度。 還有更好的方法嗎？

Answer 1

一種解決方案是對兩個數組都進行排序（添加索引列，以便排序后的數組仍包含原始索引）。 然后，為了匹配，以鎖定步驟逐步遍歷數組。 因為您期望精確的1-1對應，所以您應該始終能夠匹配成對的行。

Answer 2

首先要記住的是，默認情況下，在NumPy中，“迭代總是以C樣式的連續方式進行（最后一個索引變化最快）” [1]。 你可能會通過逆向迭代的順序改善的事情（上迭代coordMED.T ，的轉置coordMED ...）

盡管如此，我仍然對您需要一個循環感到驚訝：您聲稱“坐標在python的常規浮點精度內不匹配”：您是否嘗試過調整rtol和atol參數， np.allclose描述doc ？

[1]

Answer 3

您可以使用以下方法擺脫內部循環：

for cfsXYZ in coordsCFS:
    match = numpy.nonzero(
        numpy.max(numpy.abs(coordRMED - cfsXYZ), axis=1) < TOLERANCE)

在NumPy中使用For循環進行逐行比較緩慢-如何改進？

問題描述

3 個解決方案

解決方案1
1

解決方案2
1 2012-10-05 08:15:34

解決方案3
1 已采納 2012-10-05 08:39:35

在NumPy中使用For循環進行逐行比較緩慢-如何改進？

問題描述

3 個解決方案

解決方案1 1

解決方案2 1 2012-10-05 08:15:34

解決方案3 1 已采納 2012-10-05 08:39:35

解決方案1
1

解決方案2
1 2012-10-05 08:15:34

解決方案3
1 已采納 2012-10-05 08:39:35