简体   繁体   English

如何快速迭代一个大列表?

[英]How can i iterate over a large list quickly?

I am trying to iterate over a large list.我正在尝试遍历一个大列表。 I want a method that can iterate this list quickly.我想要一种可以快速迭代此列表的方法。 But it takes much time to iterate.但是迭代需要很多时间。 Is there any method to iterate quickly or python is not built to do this.是否有任何方法可以快速迭代或没有构建 python 来执行此操作。 My code snippet is :-我的代码片段是:-

for i in THREE_INDEX:
    if check_balanced(rc, pc):
        print('balanced')
    else:
        rc, pc = equation_suffix(rc, pc, i) 

Here THREE_INDEX has a length of 117649. It takes much time to iterate over this list, is there any method to iterate it quicker.这里THREE_INDEX的长度是117649,遍历这个list需要很多时间,有没有什么方法可以更快的遍历。 But it takes around 4-5 minutes to iterate但是迭代大概需要4-5分钟

equation_suffix functions: equation_suffix 函数:

def equation_suffix(rn, pn,  suffix_list):
    len_rn = len(rn)
    react_suffix = suffix_list[: len_rn]
    prod_suffix = suffix_list[len_rn:]
    for re in enumerate(rn):
        rn[re[0]] = add_suffix(re[1], react_suffix[re[0]])
    for pe in enumerate(pn):
        pn[pe[0]] = add_suffix(pe[1], prod_suffix[pe[0]])
    return rn, pn

check_balanced function: check_balanced 函数:

def check_balanced(rl, pl):
    total_reactant = []
    total_product = []
    reactant_name = []
    product_name = []
    for reactant in rl:
        total_reactant.append(separate_num(separate_brackets(reactant)))
    for product in pl:
        total_product.append(separate_num(separate_brackets(product)))
    for react in total_reactant:
        for key in react:
            val = react.get(key)
            val_dict = {key: val}
            reactant_name.append(val_dict)
    for prod in total_product:
        for key in prod:
            val = prod.get(key)
            val_dict = {key: val}
            product_name.append(val_dict)

    reactant_name = flatten_dict(reactant_name)
    product_name = flatten_dict(product_name)

    for elem in enumerate(reactant_name):
        val_r = reactant_name.get(elem[1])
        val_p = product_name.get(elem[1])
        if val_r == val_p:
            if elem[0] == len(reactant_name) - 1:
                return True
        else:
            return False

I believe the reason why "iterating" the list take a long time is due to the methods you are calling inside the for loop.我相信“迭代”列表需要很长时间的原因是由于您在 for 循环中调用的方法。 I took out the methods just to test the speed of the iteration, it appears that iterating through a list of size 117649 is very fast.我拿出方法只是为了测试迭代的速度,看来迭代一个117649大小的列表是非常快的。 Here is my test script:这是我的测试脚本:

import time

start_time = time.time()
new_list = [(1, 2, 3) for i in range(117649)]
end_time = time.time()
print(f"Creating the list took: {end_time - start_time}s")

start_time = time.time()
for i in new_list:
    pass
end_time = time.time()
print(f"Iterating the list took: {end_time - start_time}s")

Output is:输出是:

Creating the list took: 0.005337953567504883s
Iterating the list took: 0.0035648345947265625s

Edit: time() returns second.编辑: time() 返回秒。

General for loops aren't an issue, but using them to build (or rebuild) list s is usually slower than using list comprehensions (or in some cases, map / filter , though those are advanced tools that are often a pessimization).一般的for循环不是问题,但使用它们来构建(或重建) list通常比使用列表推导慢(或在某些情况下, map / filter ,尽管这些是通常令人悲观的高级工具)。

Your functions could be made significantly simpler this way, and they'd get faster to boot.通过这种方式,您的功能可以变得更加简单,并且它们的启动速度会更快。 Example rewrites:示例重写:

def equation_suffix(rn, pn, suffix_list):
    prod_suffix = suffix_list[len(rn):]
    # Change `rn =` to `rn[:] = ` if you must modify the caller's list as in your
    # original code, not just return the modified list (which would be fine in your original code)
    rn = [add_suffix(r, suffix) for r, suffix in zip(rn, suffix_list)]  # No need to slice suffix_list; zip'll stop when rn exhausted
    pn = [add_suffix(p, suffix) for p, suffix in zip(pn, prod_suffix)]
    return rn, pn

def check_balanced(rl, pl):
    # These can be generator expressions, since they're iterated once and thrown away anyway
    total_reactant = (separate_num(separate_brackets(reactant)) for reactant in rl)
    total_product = (separate_num(separate_brackets(product)) for product in pl)
    reactant_name = []
    product_name = []
    # Use .items() to avoid repeated lookups, and concat simple listcomps to reduce calls to append
    for react in total_reactant:
        reactant_name += [{key: val} for key, val in react.items()]
    for prod in total_product:
        product_name += [{key: val} for key, val in prod.items()]

    # These calls are suspicious, and may indicate optimizations to be had on prior lines
    reactant_name = flatten_dict(reactant_name)
    product_name = flatten_dict(product_name)

    for i, (elem, val_r) in enumerate(reactant_name.items()):
        if val_r == product_name.get(elem):
            if i == len(reactant_name) - 1:
                return True
        else:
            # I'm a little suspicious of returning False the first time a single
            # key's value doesn't match. Either it's wrong, or it indicates an
            # opportunity to write short-circuiting code that doesn't have
            # to fully construct reactant_name and product_name when much of the time
            # there will be an early mismatch
            return False

I'll also note that using enumerate without unpacking the result is going to get worse performance, and more cryptic code;我还会注意到,使用enumerate而不解包结果会导致性能变差,并且代码更加神秘; in this case (and many others), enumerate isn't needed, as listcomps and genexprs can accomplish the same result without knowing the index, but when it is needed, always unpack, eg for i, elem in enumerate(...): then using i and elem separately will always run faster than for packed in enumerate(...): and using packed[0] and packed[1] (and if you have more useful names than i and elem , it'll be much more readable to boot).在这种情况下(以及许多其他情况),不需要enumerate ,因为 listcomps 和 genexprs 可以在不知道索引的情况下完成相同的结果,但是在需要时,始终解压缩,例如for i, elem in enumerate(...):然后用ielem分别将始终运行速度比for packed in enumerate(...):和使用packed[0]packed[1]如果你有超过有用的名称ielem ,这将是启动时更具可读性)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM