[英]Loop in Python: Do stuff before first iteration
I want to optimize.我想优化。
connection = get_db_connection()
for item in my_iterator:
push_item_to_db(item, connection)
Drawback:退税:
get_db_connection()
is slow. get_db_connection()
很慢。 If my_iterator
is empty, then I want to avoid to call it.如果
my_iterator
为空,那么我想避免调用它。
connection = None
for item in my_iterator:
if connection is None:
connection = get_db_connection()
push_item_to_db(item, connection)
Drawback:退税:
If there are 100k items in my_iterator
, then if connection is None
gets called 100k times (although it is needed only once).如果
my_iterator
有 100k 个项目,那么if connection is None
将被调用 100k 次(尽管它只需要一次)。 I want to avoid this.我想避免这种情况。
get_db_connection()
if iterator is emptyget_db_connection()
if connection is None:
uselessly for every iteration. if connection is None:
则不要调用if connection is None:
对于每次迭代if connection is None:
无用的。 Any idea?任何的想法?
You can do something like: 你可以这样做:
connection = None
for item in my_iterator:
if connection is None:
connection = get_db_connection()
push_item_to_db(item, connection)
Simple solution. 简单解决方案 Don't need to overthink it.
不需要过度思考它。 Even with 100k operations,
x is None
is just a reference comparison taking one Python opcode. 即使有100k操作,
x is None
只是一个Python操作码的参考比较。 You really don't need to optimise this compared to a full tcp roundtrip + disk write that happens on every insert. 与每次插入时发生的完整tcp往返+磁盘写入相比,您真的不需要优化它。
I am not an expert in Python but I would do something like this: 我不是Python的专家,但我会做这样的事情:
def put_items_to_database (iterator):
try:
item = next(iterator)
# We connect to the database only after we
# know there at least one element in the collection
connection = get_db_connection()
while True:
push_item_to_db(item, connection)
item = next(iterator)
except StopIteration:
pass
It is probably true that the performance is tied to the database here. 表现可能与数据库绑定在一起。 However the question is about finding a way to avoid doing unnecessary work, and the above is a basic way of controlling precisely what happens during iteration.
然而,问题是如何找到避免做不必要工作的方法,以上是精确控制迭代过程中发生的事情的基本方法。
Other solutions are "simpler", in some way, but on the other hand I think this one is more explicit and follows the principle of least astonishment. 其他解决方案在某种程度上“更简单”,但另一方面,我认为这个解决方案更明确,并遵循最不惊讶的原则。
for item in my_iterator:
# First item (if any)
connection = get_db_connection()
push_item_to_db(item, connection)
for item in my_iterator:
# Next items
push_item_to_db(item, connection)
This works without a while True
loop. 这没有
while True
循环。
try:
next(my_iterator)
connection = get_db_connection()
push_item_to_db(item, connection)
except StopIteration:
pass
for item in my_iterator:
push_item_to_db(item, connection)
If you know that that iterator never returns None
(or any other unique object), you could take advantage of the default of next()
: 如果您知道迭代器永远不会返回
None
(或任何其他唯一对象),您可以利用next()
的默认值:
if next(my_iterator, None) is not None:
connection = get_db_connection()
push_item_to_db(item, connection)
for item in my_iterator:
push_item_to_db(item, connection)
If you cannot guaranty a value that never is returned by the iterator, you could use a sentinel. 如果您无法保证迭代器永远不会返回的值,则可以使用标记。
sentinel = object()
if next(my_iterator, sentinel) is not sentinel:
connection = get_db_connection()
push_item_to_db(item, connection)
for item in my_iterator:
push_item_to_db(item, connection)
Using itertools.chain()
: 使用
itertools.chain()
:
from itertools import chain
for first_item in my_iterator:
connection = get_db_connection()
for item in chain([first_item], my_iterator):
push_item_to_db(item, connection)
You Could check the iterator count before the entire section of code 您可以在整个代码段之前检查迭代器计数
if (len(my_iterator)>0):
connection = get_db_connection()
for item in my_iterator:
push_item_to_db(item, connection)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.