简体   繁体   English

有没有一种方法可以将流数据检索与休眠结合起来?

[英]Is there a way to combine streaming data retrieval with hibernate?

For the purposes of handling very large collections (and by very large I just mean "likely to throw OutOfMemory exception"), it seems problematic to use Hibernate because normally collection retrieval is done in a block, ie List values=session.createQuery("from X").list(), where you monolithically grab all N-million values and then process them. 出于处理非常大的集合的目的(我的意思是“很可能抛出OutOfMemory异常”),使用Hibernate似乎有问题,因为通常集合检索是在一个块中完成的,即List values = session.createQuery(“从X“)。list()中,您将在其中整体获取所有N百万个值,然后对其进行处理。

What I'd prefer to do is to retrieve the values as an iterator so that I grab 1000 or so (or whatever's a reasonable page size) at a time. 我更喜欢做的是将值作为迭代器进行检索,这样我一次可以抓取1000个左右(或任何合理的页面大小)。 Apart from writing my own iteration (which seems like it's likely to be re-inventing the wheel) is there a hibernate-native way to handle this? 除了编写自己的迭代(似乎很可能是在重新发明轮子)之外,还有冬眠的本地方式来处理此问题吗?

Actually session.scroll() may be better than iterate in this situation. 实际上,在这种情况下,session.scroll()可能比迭代更好。 Iterate runs one query to get all the ids and then fetches the full items one by one as you process them. 迭代运行一个查询以获取所有ID,然后在处理它们时一一读取全部项目。 Scroll uses the underlying JDBC scroll functionality which retrieves full objects but keeps a cursor open to the database. Scroll使用底层的JDBC滚动功能,该功能可检索完整的对象,但使游标对数据库保持打开状态。 With scroll you can set the batch size to figure out the most optimal number to return at a time. 使用滚动,您可以设置批处理大小,以找出每次返回的最佳数量。 If loading the N-million ids is still taking too much memory, scroll is the answer, and I suspect it will be more efficient in either case. 如果加载数百万个ID仍然占用太多内存,那么滚动就是答案,我怀疑这两种情况下效率都更高。

Neither is going to automatically close the session. 两者都不会自动关闭会话。

你可以做

Iterator iter = session.createQuery("from X").iterate();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM