简体   繁体   English

Python:获取生成器中的项目数而不存储项目

[英]Python: get number of items in generator without storing the items

I have a generator for a large set of items. 我有一套用于大量物品的发电机。 I want to iterate through them once, outputting them to a file. 我想迭代它们一次,将它们输出到一个文件。 However, with the file format I currently have, I first have to output the number of items I have. 但是,对于我目前拥有的文件格式,我首先必须输出我拥有的项目数。 I don't want to build a list of the items in memory, as there are too many of them and that would take a lot of time and memory. 我不想在内存中构建项目列表,因为它们太多而且需要花费大量的时间和内存。 Is there a way to iterate through the generator, getting its length, but somehow be able to iterate through it again later, getting the same items? 有没有办法迭代生成器,获得它的长度,但不知何故能够再次迭代它,获得相同的项目?

If not, what other solution could I come up with for this problem? 如果没有,我可以为这个问题提出什么其他解决方案?

If you can figure out how to just write a formula to calculate the size based on the parameters that control the generator, do that. 如果你能弄清楚如何根据控制发生器的参数编写一个公式来计算大小,那就这样做吧。 Otherwise, I don't think you would save much time. 否则,我认为你不会节省太多时间。

Include the generator here, and we'll try to do it for you! 在这里包括发电机,我们会尽力为您服务!

This cannot be done. 这是不可能做到的。 Once a generator is exhausted it needs to be reconstructed in order to be used again. 一旦发电机耗尽,就需要重建它以便再次使用。 It is possible to define the __len__() method on an iterator object if the number of items is known ahead of time, and then len() can be called against the iterator object. 如果__len__()知道项目数,则可以在迭代器对象上定义__len__()方法,然后可以针对迭代器对象调用len()

I don't think that is possible for any generalized iterator. 我认为任何通用迭代器都不可能。 You will need to figure out how the generator was originally constructed and then regenerate it for the final pass. 您将需要弄清楚如何最初构建生成器,然后为最终传递重新生成它。

Alternatively, you could write out a dummy size to your file, write the items, and then reopen the file for modification and correct the size in the header. 或者,您可以在文件中写出虚拟大小,编写项目,然后重新打开文件进行修改并更正标题中的大小。

If your file is a binary format, this could work quite well, since the number of bytes for the size is the same regardless of what the actual size is. 如果您的文件是二进制格式,这可以很好地工作,因为无论实际大小是多少,大小的字节数都是相同的。 If it is a text format, it is possible that you would have to add some extra length to the file if you weren't able to pad the dummy size to cover all cases. 如果是文本格式,如果您无法填充虚拟大小以覆盖所有情况,则可能需要为文件添加一些额外长度。 See this question for a discussion on inserting and rewriting in a text file using Python. 有关使用Python在文本文件中插入和重写的讨论,请参阅此问题

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM