避免使用List Comprehension和/或map嵌套嵌套循環

Question

幾天來，我一直在努力優化（不僅使它看起來更好）3個嵌套循環，其中包含一個條件調用和一個函數調用 。 我現在所擁有的是：

def build_prolongation_operator(p,qs):
    '''
    p: dimension of the coarse basis
    q: dimension of the fine basis

    The prolongation operator describes the relationship between
    the coarse and fine bases:    
    V_coarse = np.dot(V_fine, I)
    '''

    q = sum(qs)

    I = np.zeros([q, p])

    for i in range(0, q):
        for j in range(0, p):
            for k in range(0, qs[j]):
                # if BV i is a child of j, we set I[i, j] = 1
                if i == f_map(j, k, qs):
                    I[i, j] = 1
                    break

    return I

其中f_map是：

def f_map(i, j, q):
    '''
    Mapping which returns the index k of the fine basis vector which
    corresponds to the jth child of the ith coarse basis vector.    
    '''

    if j < 0 or j > q[i]:
        print('ERROR in f_map')
        return None

    result = j

    for k in range(0, i):
        result += q[k]

    return result

在分析我的整個代碼時，我得到build_prolongation_operator被調用了45次，而f_map被調用了大約850萬次！

這是圖片：

我嘗試對列表理解和地圖執行相同的操作，但是沒有任何運氣。

這是build_prolongation_operator期望的輸入樣本：

p = 10
qs = randint(3, size=p)

Answer 1

我不知道基數和延長運算符，但是您應該專注於算法本身。 在優化方面，這幾乎總是合理的建議。

這可能是症結所在，如果沒有，那就是讓您入門的原因： f_map計算不依賴於i ，但是您要針對i每個值重新計算它。 由於i范圍從0到qs的值之和，因此通過緩存結果可以節省大量的計算； 谷歌“ Python備忘錄”，它會寫自己。 解決此問題，您可能已經完成，而沒有任何微優化。

您將需要足夠的空間來存儲max(p) * max(qs[j])值，但是從您報告的呼叫次數來看，這應該不會太大。

Answer 2

嘗試檢查是否可行，

for j in range(0,p):
    for k in range(0, qs[j]):
        # if BV i is a child of j, we set I[i,j] = 1
        val = f_map(j,k,qs)
        if I[val, j] == 0:
            I[val, j] = 1

Answer 3

一方面，您實際上不需要將p作為函數的參數： len(qs)只需要調用一次，這非常便宜。 如果您的輸入始終是一個numpy數組（並且在這種情況下沒有理由不應該這樣做），則qs.size也可以。

讓我們從重寫f_map開始。 循環中只有qs的累積總和（但從零開始），您可以預先計算一次（或每次調用外部函數至少一次）。

def f_map(i, j, cumsum_q):
    return j + cumsum_q[i]

凡cumsum_q會中定義build_prolongation_operator為

cumsum_q = np.roll(np.cumsum(qs), 1)
cumsum_q[0] = 0

我相信您會欣賞在f_map中使用與build_prolongation_operator相同的變量名稱集的build_prolongation_operator 。 為了使它更容易，我們可以完全刪除f_map並在您的情況下使用它表示的表達式：

if i == k + cumsum_q[j]:
    I[i, j] = 1

k的循環則意味着“如果i是k + cumsum[j] 任意一個 k k + cumsum[j] ”，請將元素設置為1。如果將條件重寫為i - cumsum_q[j] == k ，則可以看到我們沒有這樣做根本不需要在k上循環。 i - cumsum_q[j]為非負且嚴格小於qs[j]則它將等於[0, qs[j])范圍內的某個 k 。 你可以檢查一下

if i >= cumsum_q[j] and i - cumsum_q[j] < qs[j]:
    I[i, j] = 1

這樣可以將循環減少到矩陣的每個元素一次迭代。 您不能做得更好：

def build_prolongation_operator_optimized(qs):
    '''
    The prolongation operator describes the relationship between
    the coarse and fine bases:    
    V_coarse = np.dot(V_fine, I)
    '''
    qs = np.asanyarray(qs)
    p = qs.size
    cumsum_q = np.roll(np.cumsum(qs), 1)
    q = cumsum_q[0]
    cumsum_q[0] = 0

    I = np.zeros([q, p])

    for i in range(0, q):
        for j in range(0, p):
            # if BV i is a child of j, we set I[i, j] = 1
            if 0 <= i - cumsum_q[j] < qs[j]:
                I[i, j] = 1
    return I

現在您已經知道了每個單元格的公式，您可以讓numpy使用廣播在基本上一行中為您計算整個矩陣：

def build_prolongation_operator_numpy(qs):
    qs = np.asanyarray(qs)
    cumsum_q = np.roll(np.cumsum(qs), 1)
    q = cumsum_q[0]
    cumsum_q[0] = 0
    i_ = np.arange(q).reshape(-1, 1)  # Make this a column vector
    return (i_ >= cumsum_q) & (i_ - cumsum_q < qs)

我運行了一個小示例，以確保（A）提出的解決方案與原始解決方案具有相同的結果，並且（B）可以更快地工作：

In [1]: p = 10
In [2]: q = np.random.randint(3, size=p)

In [3]: ops = (
...     build_prolongation_operator(p, qs),
...     build_prolongation_operator_optimized(qs),
...     build_prolongation_operator_numpy(qs),
...     build_prolongation_operator_RaunaqJain(p, qs),
...     build_prolongation_operator_gboffi(p, qs),
... )

In [4]: np.array([[(op1 == op2).all() for op1 in ops] for op2 in ops])
Out[4]: 
array([[ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True]])

In [5]: %timeit build_prolongation_operator(p, qs)
321 µs ± 890 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit build_prolongation_operator_optimized(qs)
75.1 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit build_prolongation_operator_numpy(qs)
24.8 µs ± 77.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [8]: %timeit build_prolongation_operator_RaunaqJain(p, qs)
28.5 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: %timeit build_prolongation_operator_gboffi(p, qs)
31.8 µs ± 772 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: %timeit build_prolongation_operator_gboffi2(p, qs)
26.6 µs ± 768 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

如您所見，最快的選項是完全矢量化的選項，但是@RaunaqJain和@gboffi的選項緊隨其后。

注意

我的矢量化解決方案創建了一個布爾數組。 如果您不希望這樣做，請使用I.astype(...)轉換為所需的np.uint8 ，或者將其查看為np.uint8數組： I.view(dtype=np.uint8) 。

Answer 4

這是Raunaq Jain在回答中提出的優化循環

for j in range(0,p):
    for k in range(0, qs[j]):
        # if BV i is a child of j, we set I[i,j] = 1
            val = f_map(j,k,qs)
            if I[val, j] == 0:
                I[val, j] = 1

這是f_map函數，在這里我編輯了參數的名稱以反映調用者使用的名稱

def f_map(j,k,qs):
    if k < 0 or k > qs[j]:
        print('ERROR in f_map')
        return None
    result = k
    for i in range(0, j):
        result += qs[i]
    return result

首先，由於k上的循環定義，它始終0 ≤ k < qs[j] ，以便我們可以安全地刪除完整性檢查並編寫

def f_map(j,k,qs):
    result = k
    for i in range(0, j):
        result += q[i]
    return result

現在，這是內置sum的文檔字符串

簽名：sum（可迭代，start = 0，/）
文檔字符串：
返回“起始”值（默認值：0）加上可迭代數字的總和

當iterable為空時，返回起始值。
此函數專門用於數字值，並且可以拒絕非數字類型。
類型：builtin_function_or_method

顯然，我們可以寫

def f_map(j,k,qs):
    return sum(qs[:j], k)

而且很明顯，我們可以不執行函數調用

for j in range(0,p):
    for k in range(0, qs[j]):
        # if BV i is a child of j, we set I[i,j] = 1
            val = sum(qs[:j], k)
            if I[val, j] == 0:
                I[val, j] = 1

調用內置函數應該比函數調用和循環更有效，不是嗎？

談到瘋狂物理學家的話

我們可以預先計算qs的部分和以獲得進一步的加速

sqs = [sum(qs[:i]) for i in range(len(qs))] # there are faster ways...
...
for j in range(0,p):
    for k in range(0, qs[j]):
        # if BV i is a child of j, we set I[i,j] = 1
            val = k+sqs[j]
            if I[val, j] == 0:
                I[val, j] = 1

避免使用List Comprehension和/或map嵌套嵌套循環

問題描述

4 個解決方案

解決方案1
3 2018-09-13 10:40:01

解決方案2
1 2018-09-13 11:07:41

解決方案3
1 已采納 2018-09-13 12:09:18

解決方案4
1 2018-09-13 13:33:31

避免使用List Comprehension和/或map嵌套嵌套循環

問題描述

4 個解決方案

解決方案1 3 2018-09-13 10:40:01

解決方案2 1 2018-09-13 11:07:41

解決方案3 1 已采納 2018-09-13 12:09:18

解決方案4 1 2018-09-13 13:33:31

解決方案1
3 2018-09-13 10:40:01

解決方案2
1 2018-09-13 11:07:41

解決方案3
1 已采納 2018-09-13 12:09:18

解決方案4
1 2018-09-13 13:33:31