简体   繁体   English

Peewee ORM:如何有效地迭代大型结果集

[英]Peewee ORM: how to efficiently iterate over a large resultset

I want my program to start processing rows, as soon as they are received from the MySQL server (many rows and slow connection). 我希望我的程序一旦从MySQL服务器接收到它们就开始处理行(许多行和慢速连接)。

The docs recommend for querying lots of rows : MyModel.select().iterator() . 文档建议查询大量行MyModel.select().iterator()

However, it seems that first the DB server sends all the data, before the iterator yields its first result (verified with tcpdump in another terminal). 但是,似乎首先DB服务器在迭代器产生第一个结果之前发送所有数据(在另一个终端中用tcpdump验证)。

I tried accomplishing this with the raw DB drivers MySQLdb and pymysql but there the results seem to get buffered as well. 我尝试使用原始数据库驱动程序MySQLdbpymysql完成此操作,但结果似乎也得到了缓冲。

Is it at all possible? 它可能吗? How do other Peewee devs handle iterating over large datasets? 其他Peewee开发人员如何处理迭代大型数据集?

Willem, for this problem, Postgresql provides named cursors (or server-side cursors), which are supported by peewee: Willem,对于这个问题,Postgresql提供了peewee支持的命名游标(或服务器端游标):

http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#server-side-cursors http://docs.peewee-orm.com/en/latest/peewee/playhouse.html#server-side-cursors

I am not super familiar with MySQL but perhaps it provides something similar? 我不是非常熟悉MySQL,但也许它提供类似的东西?

If not you can always use a chunked iterator. 如果不是,您可以始终使用分块迭代器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM