简体繁体 English

Hibernate的批量获取算法如何工作？

[英]How does Hibernate's batch-fetching algorithm work?

原文 2010-08-12 15:07:12 6 2 java/ algorithm/ optimization/ java-ee/ fetching-strategy

I found this description of the batch-fetching algorithm in "Manning - Java Persistence with Hibernate": 我在“Manning-Java Persistence with Hibernate”中找到了批量获取算法的描述：

What is the real batch-fetching algorithm? 什么是真正的批量获取算法？ (...) Imagine a batch size of 20 and a total number of 119 uninitialized proxies that have to be loaded in batches. （...）想象一下，批量大小为20，总共有119个未初始化的代理需要批量加载。 At startup time, Hibernate reads the mapping metadata and creates 11 batch loaders internally. 在启动时，Hibernate读取映射元数据并在内部创建11个批处理加载器。 Each loader knows how many proxies it can initialize: 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. The goal is to minimize the memory consumption for loader creation and to create enough loaders that every possible batch fetch can be produced. 每个加载器知道它可以初始化多少代理：20,10,9,8,7,6,5,4,3,2,1。目标是最小化加载器创建的内存消耗并创建足够的每个加载器可以生成可能的批量提取。 Another goal is to minimize the number of SQL SELECTs, obviously. 另一个目标是显着减少SQL SELECT的数量。 To initialize 119 proxies Hibernate executes seven batches (you probably expected six, because 6 x 20 > 119). 要初始化119个代理，Hibernate会执行7个批处理（你可能预计会有6个，因为6 x 20> 119）。 The batch loaders that are applied are five times 20, one time 10, and one time 9, automatically selected by Hibernate. 应用的批处理加载程序是Hibernate自动选择的五倍20倍，一倍10倍，一倍9倍。

but I still don't understand how it works. 但我仍然不明白它是如何工作的。

Why 11 batch loaders ? 为什么11批装载机？
Why batch loaders can initialize: 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 proxies ? 为什么批量加载器可以初始化：20,10,9,8,7,6,5,4,3,2,1代理？

If anybody could present a step by step algorithm ... :) 如果有人可以提出一步一步的算法...... :)

2 个解决方案

This helps avoid creating a large number of different prepared statements. 这有助于避免创建大量不同的预准备语句。

Each query (prepared statement) needs to be parsed and its execution plan needs to be calculated and cached by the database. 需要解析每个查询（预准备语句），并且需要由数据库计算和缓存其执行计划。 This process may be much more expensive than the actual execution of the query for which the statement has already been cached. 此过程可能比已经缓存语句的查询的实际执行要昂贵得多。

A large number of different statements may lead to purging other cached statements out of the cache, thus degrading the overall application performance. 大量不同的语句可能会导致从缓存中清除其他缓存的语句，从而降低整体应用程序性能。

Also, since hard parse is generally very expensive, it is usually faster to execute multiple cached prepared statements (including multiple database round trips) than to parse and execute a new one. 此外，由于硬分析通常非常昂贵，因此执行多个缓存的预准备语句（包括多个数据库往返）通常比解析和执行新语句更快。 So, besides the obvious benefit of reducing the number of different statements, it may actually be faster to retrieve all of the 119 entities by executing 11 cached statements than to create and execute a single new one which contains all of the 119 ids. 因此，除了减少不同语句数量的明显好处之外，通过执行11个缓存语句检索所有119个实体实际上比创建和执行包含所有119个ID的单个新实体更快。

As already mentioned in the comments, Hibernate invokes ArrayHelper.getBatchSizes method to determine the batch sizes for the given maximum batch size. 正如评论中已经提到的，Hibernate调用ArrayHelper.getBatchSizes方法来确定给定最大批量大小的批量大小。

I couldn't find any information on the web about how hibernate handles batch loading, but judging from your information, one could guess the following: 我在网上找不到有关hibernate如何处理批量加载的任何信息，但从您的信息判断，可以猜到以下内容：

Why 11 batch loaders? 为什么11批装载机？

With a batch size of 20, if you want to minimize the number of loaders required for any combination of proxies, there are basically two options: 批量大小为20时，如果要最小化任何代理组合所需的加载器数量，基本上有两个选项：

create a loader for 1,2,3,4,5,6,7,...20,21,22,23,... N uninitialized proxies (stupid!) OR 为1,2,3,4,5,6,7，...创建一个加载器20,21,22,23，... N未初始化的代理（愚蠢！）或
create a loader for any N between 1..9 and then create more loaders for batch_size/2 (recursively) 为1..9之间的任何N创建一个加载器，然后为batch_size/2创建更多加载器（递归）

Example: for batch size 40, you would end up with loaders for 40,20,10,9,8,7,6,5,4,3,2,1 loaders. 示例：对于40号批次，最终将装载40,20,10,9,8,7,6,5,4,3,2,1装载机的装载机。

If you have 33 uninitialized proxies, you can use the following loaders: 20, 10, 3 如果您有33个未初始化的代理，则可以使用以下加载器：20,10,3
If you have 119 uninitialized proxies, you can use the following loaders, 40(x2), 20, 10, 9 如果您有119个未初始化的代理，则可以使用以下加载器，40（x2），20,10,9
... ...

Why batch loaders can initialize: 20, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1 proxies ? 为什么批量加载器可以初始化：20,10,9,8,7,6,5,4,3,2,1代理？ I think the hibernate team chose this as a balance between the number of loaders required for loading a "common" number N of uninitialized proxies and memory consumption. 我认为hibernate团队选择这个作为加载未初始化代理的“常见”数N和内存消耗所需的加载器数量之间的平衡。 The could have created a loader for every N between 0 and batch_size , but I suspect that the loaders have a considerable memory footprint so this is a tradeoff . 本可以为0和batch_size之间的每个N创建一个加载器，但我怀疑加载器有相当大的内存占用， 所以这是一个权衡 。 The algorithm can be something like this (educated guess): 算法可以是这样的（有根据的猜测）：