在 Python 中循环：在第一次迭代之前做一些事情

Question

I want to optimize.我想优化。

Simple solution简单的解决方案

connection = get_db_connection()
for item in my_iterator:
    push_item_to_db(item, connection)

Drawback:退税：

get_db_connection() is slow. get_db_connection()很慢。 If my_iterator is empty, then I want to avoid to call it.如果my_iterator为空，那么我想避免调用它。

"if None" solution “如果没有”解决方案

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Drawback:退税：

If there are 100k items in my_iterator , then if connection is None gets called 100k times (although it is needed only once).如果my_iterator有 100k 个项目，那么if connection is None将被调用 100k 次（尽管它只需要一次）。 I want to avoid this.我想避免这种情况。

Perfect solution ...完美解决...

don't call get_db_connection() if iterator is empty如果迭代器为空，则不要调用get_db_connection()
don't call if connection is None: uselessly for every iteration. if connection is None:则不要调用if connection is None:对于每次迭代if connection is None:无用的。

Any idea?任何的想法？

Answer 1

You can do something like: 你可以这样做：

connection = None
for item in my_iterator:
    if connection is None:
        connection = get_db_connection()
    push_item_to_db(item, connection)

Simple solution. 简单解决方案 Don't need to overthink it. 不需要过度思考它。 Even with 100k operations, x is None is just a reference comparison taking one Python opcode. 即使有100k操作， x is None只是一个Python操作码的参考比较。 You really don't need to optimise this compared to a full tcp roundtrip + disk write that happens on every insert. 与每次插入时发生的完整tcp往返+磁盘写入相比，您真的不需要优化它。

Answer 2

I am not an expert in Python but I would do something like this: 我不是Python的专家，但我会做这样的事情：

def put_items_to_database (iterator):
    try:
        item = next(iterator)

        # We connect to the database only after we 
        # know there at least one element in the collection            
        connection = get_db_connection()

        while True:
            push_item_to_db(item, connection)
            item = next(iterator)
    except StopIteration:
        pass

It is probably true that the performance is tied to the database here. 表现可能与数据库绑定在一起。 However the question is about finding a way to avoid doing unnecessary work, and the above is a basic way of controlling precisely what happens during iteration. 然而，问题是如何找到避免做不必要工作的方法，以上是精确控制迭代过程中发生的事情的基本方法。

Other solutions are "simpler", in some way, but on the other hand I think this one is more explicit and follows the principle of least astonishment. 其他解决方案在某种程度上“更简单”，但另一方面，我认为这个解决方案更明确，并遵循最不惊讶的原则。

Answer 3

for item in my_iterator:
    # First item (if any)
    connection = get_db_connection()
    push_item_to_db(item, connection)
    for item in my_iterator:
        # Next items
        push_item_to_db(item, connection)

Answer 4

Solution 1 解决方案1

This works without a while True loop. 这没有while True循环。

try:
    next(my_iterator)
    connection = get_db_connection()
    push_item_to_db(item, connection)
except StopIteration:
    pass
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 2 解决方案2

If you know that that iterator never returns None (or any other unique object), you could take advantage of the default of next() : 如果您知道迭代器永远不会返回None （或任何其他唯一对象），您可以利用next()的默认值：

if next(my_iterator, None) is not None:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 3 解决方案3

If you cannot guaranty a value that never is returned by the iterator, you could use a sentinel. 如果您无法保证迭代器永远不会返回的值，则可以使用标记。

sentinel = object()
if next(my_iterator, sentinel) is not sentinel:
    connection = get_db_connection()
    push_item_to_db(item, connection)
for item in my_iterator:
    push_item_to_db(item, connection)

Solution 4 解决方案4

Using itertools.chain() : 使用itertools.chain() ：

from itertools import chain

for first_item in my_iterator:
    connection = get_db_connection()
    for item in chain([first_item], my_iterator):
        push_item_to_db(item, connection)

Answer 5

You Could check the iterator count before the entire section of code 您可以在整个代码段之前检查迭代器计数

if (len(my_iterator)>0): 
    connection = get_db_connection()
    for item in my_iterator:
        push_item_to_db(item, connection)

在 Python 中循环：在第一次迭代之前做一些事情

问题描述

Simple solution简单的解决方案

"if None" solution “如果没有”解决方案

Perfect solution ...完美解决...

4 个解决方案

解决方案1
5 2016-04-04 10:15:20

解决方案2
2 2016-04-04 10:33:53

解决方案3
2 2016-04-04 13:53:00

解决方案4
1 2016-04-04 12:09:05

Solution 1 解决方案1

Solution 2 解决方案2

Solution 3 解决方案3

Solution 4 解决方案4

解决方案5
-1 2016-04-04 10:22:34

在 Python 中循环：在第一次迭代之前做一些事情

问题描述

Simple solution简单的解决方案

"if None" solution “如果没有”解决方案

Perfect solution ...完美解决...

4 个解决方案

解决方案1 5 2016-04-04 10:15:20

解决方案2 2 2016-04-04 10:33:53

解决方案3 2 2016-04-04 13:53:00

解决方案4 1 2016-04-04 12:09:05

Solution 1 解决方案1

Solution 2 解决方案2

Solution 3 解决方案3

Solution 4 解决方案4

解决方案5 -1 2016-04-04 10:22:34

解决方案1
5 2016-04-04 10:15:20

解决方案2
2 2016-04-04 10:33:53

解决方案3
2 2016-04-04 13:53:00

解决方案4
1 2016-04-04 12:09:05

解决方案5
-1 2016-04-04 10:22:34