简体   繁体   English

在 Python 中循环:在第一次迭代之前做一些事情

[英]Loop in Python: Do stuff before first iteration

I want to optimize.我想优化。

Simple solution简单的解决方案

connection = get_db_connection()
for item in my_iterator:
    push_item_to_db(item, connection)

Drawback:退税:

get_db_connection() is slow. get_db_connection()很慢。 If my_iterator is empty, then I want to avoid to call it.如果my_iterator为空,那么我想避免调用它。

"if None" solution “如果没有”解决方案

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Drawback:退税:

If there are 100k items in my_iterator , then if connection is None gets called 100k times (although it is needed only once).如果my_iterator有 100k 个项目,那么if connection is None将被调用 100k 次(尽管它只需要一次)。 I want to avoid this.我想避免这种情况。

Perfect solution ...完美解决...

  1. don't call get_db_connection() if iterator is empty如果迭代器为空,则不要调用get_db_connection()
  2. don't call if connection is None: uselessly for every iteration. if connection is None:则不要调用if connection is None:对于每次迭代if connection is None:无用的。

Any idea?任何的想法?

You can do something like: 你可以这样做:

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Simple solution. 简单解决方案 Don't need to overthink it. 不需要过度思考它。 Even with 100k operations, x is None is just a reference comparison taking one Python opcode. 即使有100k操作, x is None只是一个Python操作码的参考比较。 You really don't need to optimise this compared to a full tcp roundtrip + disk write that happens on every insert. 与每次插入时发生的完整tcp往返+磁盘写入相比,您真的不需要优化它。

I am not an expert in Python but I would do something like this: 我不是Python的专家,但我会做这样的事情:

def put_items_to_database (iterator):
    try:
        item = next(iterator)

        # We connect to the database only after we 
        # know there at least one element in the collection            
        connection = get_db_connection()

        while True:
            push_item_to_db(item, connection)
            item = next(iterator)
    except StopIteration:
        pass

It is probably true that the performance is tied to the database here. 表现可能与数据库绑定在一起。 However the question is about finding a way to avoid doing unnecessary work, and the above is a basic way of controlling precisely what happens during iteration. 然而,问题是如何找到避免做不必要工作的方法,以上是精确控制迭代过程中发生的事情的基本方法。

Other solutions are "simpler", in some way, but on the other hand I think this one is more explicit and follows the principle of least astonishment. 其他解决方案在某种程度上“更简单”,但另一方面,我认为这个解决方案更明确,并遵循最不惊讶的原则。

for item in my_iterator:
    # First item (if any)
    connection = get_db_connection()
    push_item_to_db(item, connection)
    for item in my_iterator:
        # Next items
        push_item_to_db(item, connection)

Solution 1 解决方案1

This works without a while True loop. 这没有while True循环。

try:
    next(my_iterator)
    connection = get_db_connection()
    push_item_to_db(item, connection)
except StopIteration:
    pass
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 2 解决方案2

If you know that that iterator never returns None (or any other unique object), you could take advantage of the default of next() : 如果您知道迭代器永远不会返回None (或任何其他唯一对象),您可以利用next()的默认值:

if next(my_iterator, None) is not None:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 3 解决方案3

If you cannot guaranty a value that never is returned by the iterator, you could use a sentinel. 如果您无法保证迭代器永远不会返回的值,则可以使用标记。

sentinel = object()
if next(my_iterator, sentinel) is not sentinel:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 4 解决方案4

Using itertools.chain() : 使用itertools.chain()

from itertools import chain

for first_item in my_iterator:
    connection = get_db_connection()
    for item in chain([first_item], my_iterator):
        push_item_to_db(item, connection)

You Could check the iterator count before the entire section of code 您可以在整个代码段之前检查迭代器计数

if (len(my_iterator)>0): 
    connection = get_db_connection()
    for item in my_iterator:
        push_item_to_db(item, connection)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM