在 Python 中迭代列表並找到合適的字符串模式的最快（最有效）方法是什么？

Question

所以我有一個看起來像這樣的字符串列表 -

['https://images.website.com/images/data/X/source1',
'https://images.website.com/articles/data/main_data/source2',
'https://images.website.com/harmony/data/Y/source3',
'https://images.website.com/files/data/Z/source4',
'https://images.website.com/pictures/data/T/source5']

我需要找到其中包含 main_data 的項目。

我想到了類似的東西。


def func():
    for item in list:
        if item.index('/main_data/'):   # not -1 
            return item

return -1  # item not found

當我有 100-1000 個甚至更多項目的列表時，這是最快的方法嗎？

Answer 1

如果您提到的只有 100-1000 個項目，我想您已經為我們發布的方法會立即完成，因此很難想象加快速度帶來的可感知收益。 但是，如果您有更多項目，如果可能的話，在 for 循環上使用內置 function 通常更快。

def func(l):
    return next(filter(lambda s: '/main_data/' in s, l))
    # raises StopIteration if no string contains /main_data/

Answer 2

有很多點需要說明：

如果您關心性能，請不要將 Python 用於該位。
如果您可以使用多個核心，請這樣做。 該任務是令人尷尬的並行，盡管您可能需要將線程保留在池中，因為項目很少。
如果您大致知道 substring 在字符串中的位置，那么您可以避免遍歷整個字符串。 與猜測字符串通常在列表中的 position 相同。
如果您有一些關於允許您減少搜索的屬性的信息，請使用它。
如果您需要多次搜索，也許是針對不同的術語，您可能能夠構建更好的數據結構或索引，而不是每次都執行簡單的搜索。

Answer 3

你的 function 最快

def func_1(path_list):
     for item in path_list:
         if '/main_data/' in item:
             return item
     return -1

例如，對於 100 個元素，時間處理為 0:00:00.000048，這很好。

path_list = 99*['https://images.website.com/images/data/X/source1']
path_list.append('https://images.website.com/articles/data/main_data/source2')

from datetime import datetime

now = datetime.now()

func_1(path_list)

end = datetime.now()
print(end-now)

0:00:00.000048

在 Python 中迭代列表並找到合適的字符串模式的最快（最有效）方法是什么？

問題描述

3 個解決方案

解決方案1
2 2020-11-25 17:12:48

解決方案2
2 2020-11-25 17:14:11

解決方案3
1 已采納 2020-11-25 17:25:38

在 Python 中迭代列表並找到合適的字符串模式的最快（最有效）方法是什么？

問題描述

3 個解決方案

解決方案1 2 2020-11-25 17:12:48

解決方案2 2 2020-11-25 17:14:11

解決方案3 1 已采納 2020-11-25 17:25:38

解決方案1
2 2020-11-25 17:12:48

解決方案2
2 2020-11-25 17:14:11

解決方案3
1 已采納 2020-11-25 17:25:38