简体   繁体   English

在 Python 中迭代列表并找到合适的字符串模式的最快(最有效)方法是什么?

[英]What's the fastest (most efficient) way to iterate through a list and find a fitting string pattern in Python?

So I have a list of string which looks something like that -所以我有一个看起来像这样的字符串列表 -

['https://images.website.com/images/data/X/source1',
'https://images.website.com/articles/data/main_data/source2',
'https://images.website.com/harmony/data/Y/source3',
'https://images.website.com/files/data/Z/source4',
'https://images.website.com/pictures/data/T/source5']

I need to find the item which has main_data in it.我需要找到其中包含 main_data 的项目。

I thought of something like.我想到了类似的东西。


def func():
    for item in list:
        if item.index('/main_data/'):   # not -1 
            return item

return -1  # item not found

Is it the fastest way when I have a list of 100-1000 and maybe more items?当我有 100-1000 个甚至更多项目的列表时,这是最快的方法吗?

If you only have 100-1000 items as you mentioned, I would imagine that the method you already posted for us completes instantly, so it's hard to imagine perceivable gains from speeding things up.如果您提到的只有 100-1000 个项目,我想您已经为我们发布的方法会立即完成,因此很难想象加快速度带来的可感知收益。 However, if you have many more items, it is usually faster to use a builtin function over a for loop if possible.但是,如果您有更多项目,如果可能的话,在 for 循环上使用内置 function 通常更快。

def func(l):
    return next(filter(lambda s: '/main_data/' in s, l))
    # raises StopIteration if no string contains /main_data/

There are many points to be made:有很多点需要说明:

  • If performance is your concern, don't use Python for that bit.如果您关心性能,请不要将 Python 用于该位。
  • If you can use several cores, do so.如果您可以使用多个核心,请这样做。 The task is embarrassingly parallel, although you likely need to keep the threads in a pool, because there are so few items.该任务是令人尴尬的并行,尽管您可能需要将线程保留在池中,因为项目很少。
  • If you roughly know where the substring is within a string, then you can avoid traversing the entire string.如果您大致知道 substring 在字符串中的位置,那么您可以避免遍历整个字符串。 Same for guessing at which position the string usually is within the list.与猜测字符串通常在列表中的 position 相同。
  • If you have information about some property that allows you to cut the search down, use it.如果您有一些关于允许您减少搜索的属性的信息,请使用它。
  • If you need to search several times, perhaps for different terms, you are likely able to build a better data structure or an index rather than performing the naive search every time.如果您需要多次搜索,也许是针对不同的术语,您可能能够构建更好的数据结构或索引,而不是每次都执行简单的搜索。

Your function is fastest你的 function 最快

def func_1(path_list):
     for item in path_list:
         if '/main_data/' in item:
             return item
     return -1  

For example, for 100 elements, the time processing is 0:00:00.000048 and thats is good.例如,对于 100 个元素,时间处理为 0:00:00.000048,这很好。

path_list = 99*['https://images.website.com/images/data/X/source1']
path_list.append('https://images.website.com/articles/data/main_data/source2')

from datetime import datetime

now = datetime.now()

func_1(path_list)

end = datetime.now()
print(end-now)

0:00:00.000048 0:00:00.000048

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 python 中迭代列表的最有效方法是什么? - Which is the most efficient way to iterate through a list in python? 遍历列表的最有效方法 - Most efficient way to iterate through list of lists 在字符串中查找最常用字符的最快方法是什么 - What's the fastest way to find the most commonly used char in a string 在列表中查找因子的最有效方法是什么? - What's the most efficient way to find factors in a list? 在操作列表或字符串时,最有效的迭代方法是什么? - What's the most effective way to Iterate, while manipulating a list or string? 在 Python 中查找多个子字符串之一的最有效方法是什么? - What's the most efficient way to find one of several substrings in Python? 在Python中以字符串格式对日期列表进行排序的最有效方法是什么? - What is the most efficient way to sort a list of dates in string format in Python? 循环遍历列表并创建单个字符串的最快方法是什么? - What's the fastest way to loop through a list and create a single string? Python 在列表中选择最长字符串的最有效方法? - Python's most efficient way to choose longest string in list? Python:将以下数据帧解压缩为矩阵的最有效/最快的方法是什么? - Python: What is the most efficient / fastest way to unpack the following dataframe to a matrix?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM