嵌套的for循環會降低性能

Question

我有一個相當大的文本文件（約16k行），我在其中循環遍歷每一行，檢查行中是否存在客戶端IP：Port，服務器IP：Port和關鍵字，使用兩個for循環， if x in line ，則嵌套if x in line語句，以檢查行是否包含我要查找的信息。

確定包含要查找的值的行后，我將更新一個sqlite數據庫。 最初，由於我沒有在手動事務中包裝SQL UPDATE語句，因此執行此過程花費了大量時間。 進行此更改后，執行時間顯着提高，但是我仍然發現下面的代碼需要花費幾分鍾的時間才能完成，並且我認為是我可怕的循環結構所致。

如果有人有任何性能提示可幫助加快以下代碼的速度，我將不勝感激：

c.execute("SELECT client_tuple, origin_tuple FROM connections")
# returns ~ 8k rows each with two items, clientIP:port and serverIP:port
tuples = c.fetchall()

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        for tuple in tuples:
            # if the client tuple (IP:Port) is in the line
            if tuple[0] in line:
                # if the origin tuple (IP:Port) is in the line
                if tuple[1] in line:
                    # if 'foo' is in the line
                    if 'foo' in line:
                        # lookup some value and update SQL with the value found
                        bar_value = re.findall(r'(?<=bar\s).+?(?=\,)', line)
                        c.execute("UPDATE connections "
                                    " SET bar = ? "
                                   "WHERE client_tuple = ? AND origin_tuple = ?",
                                    (bar_value[0], tuple[0], tuple[1]))

    conn.commit()

Answer 1

if 'foo' in line:檢查for tuple in tuples:迭代器for tuple in tuples:應該在之前，因此您將自動跳過不需要進行處理的行

第二個小改進-在循環之外編譯regexp並使用編譯的匹配器。

Answer 2

不幸的是，您無法收緊for循環，因為您需要遍歷文件中每一行的所有元組。 但是您可以通過合並if語句來稍微收緊代碼。 在遍歷所有元組之前，您可能應該檢查'foo'的存在。

with open('connection_details.txt', 'r') as f:
    c.execute('BEGIN TRANSACTION')
    # for each line in ~16k lines
    for line in f:
        # for each row returned from sql query
        if 'foo' in line:
            for tup in tuples:
                if tup[0] in line and tup[1] in line:

Answer 3

對於for循環，可以使用itertools並且可以將if語句變成單個語句，如下所示：

import itertools

for line, tuple in itertools.product(f, tuples):
    if tuple[0] in line and tuple[1] in line and 'foo' in line:

嵌套的for循環會降低性能

問題描述

3 個解決方案

解決方案1
7 已采納 2017-05-19 14:13:31

解決方案2
5 2017-05-19 14:09:11

解決方案3
1 2017-05-19 14:09:08

嵌套的for循環會降低性能

問題描述

3 個解決方案

解決方案1 7 已采納 2017-05-19 14:13:31

解決方案2 5 2017-05-19 14:09:11

解決方案3 1 2017-05-19 14:09:08

解決方案1
7 已采納 2017-05-19 14:13:31

解決方案2
5 2017-05-19 14:09:11

解決方案3
1 2017-05-19 14:09:08