简体   繁体   English

JDBC 查询与 JPA 查询性能

[英]JDBC Query vs JPA Query Performance

I am having some issues related with performance while reading thousands of records from the database.从数据库中读取数千条记录时,我遇到了一些与性能相关的问题。 I noticed that a pure JDBC query is much more faster that a JPA Native query.我注意到纯 JDBC 查询比 JPA 本机查询快得多。

Here is the query这是查询

select ID, COL_A, COL_B, COL_C, COL_D, COL_E, COL_F from MY_SUPER_VIEW_V v 
where 1=1 
and v.ID in (:idList)
and v.DATE_FROM <= :date
and v.DATE_TILL >= :date;

This query returns around 38.000 records.此查询返回大约 38.000 条记录。

The in idList has more than 1000 records and because I am using an Oracle DB it needs to be split in n queries. in idList 有超过 1000 条记录,因为我使用的是 Oracle 数据库,因此需要将其拆分为 n 个查询。

Further i have a method that coverts the Object[] result to my List<Entity> .此外,我有一种方法可以将 Object[] 结果转换为我的List<Entity>

In order to understand the performance issue i created a pure JDBC query and a JPA Native query respectively to compare the results.为了了解性能问题,我分别创建了一个纯 JDBC 查询和一个 JPA 本机查询来比较结果。

Here are the timings.以下是时间安排。

################ getScoresPureJDBCWithListIds ################
List of Ids retrieved. It took: 00:00:00.096 to execute query on DB using JDBC
It took: 00:00:01.180 to execute query on DB using JDBC query
Creating 24206 Scores records from DB result It took: 00:00:04.440
It took: 00:00:01.038 to execute query on DB using JDBC query
Creating 14445 Scores records from DB result It took: 00:00:04.307
################ getScoresJPANativeQueryWithListIds ################
It took: 00:06:09.450 to execute query on DB using JPA Native query
Creating 24206 Scores records from DB result It took: 00:00:00.009
It took: 00:04:04.879 to execute query on DB using JPA Native query
Creating 14445 Scores records from DB result It took: 00:00:00.007

With Hibernate analytics使用 Hibernate 分析

################ USING FETCH_SIZE: 2000 ################
################ getSmartESGScoresPureJDBCWithListCsfLcIds ################
List of Securities CsfLcId retrieved. It took: 00:00:00.296 to execute query on DB using JDBC
It took: 00:00:11.940 to execute query on DB using JDBC query
Creating 24206 Smart Esg Scores records from DB result It took: 00:00:02.670
It took: 00:00:13.570 to execute query on DB using JDBC query
Creating 14445 Smart Esg Scores records from DB result It took: 00:00:02.553
################ getSmartESGScoresJDBCTemplateWithListCsfLcIds ################
List of Securities CsfLcId retrieved. It took: 00:00:00.087 to execute query on DB using JDBC
Creating 24206 Smart Esg Scores records from DB result It took: 00:00:04.063
Creating 14445 Smart Esg Scores records from DB result It took: 00:00:04.064
################ getSmartESGScoresJPANativeQueryAsESGenius with hint fetch size 2000 ################
2020-04-22 09:36:30.830  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
    1232369 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    1448702 nanoseconds spent preparing 1 JDBC statements;
    3992364 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
List of Securities CsfLcId retrieved. It took: 00:00:00.261 to execute query on DB using JDBC
2020-04-22 09:47:23.739  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
    73670 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    805772 nanoseconds spent preparing 1 JDBC statements;
    651947762290 nanoseconds spent executing 1 JDBC statements; ==> 10 minutes
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
It took: 00:10:52.898 to execute query on DB using JPA Native query
Creating 24206 Smart Esg Scores records from DB result It took: 00:00:00.018
2020-04-22 09:56:00.792  INFO 13262 --- [           main] i.StatisticalLoggingSessionEventListener : Session Metrics {
    2758010 nanoseconds spent acquiring 1 JDBC connections;
    0 nanoseconds spent releasing 0 JDBC connections;
    3096653 nanoseconds spent preparing 1 JDBC statements;
    516148003151 nanoseconds spent executing 1 JDBC statements;
    0 nanoseconds spent executing 0 JDBC batches;
    0 nanoseconds spent performing 0 L2C puts;
    0 nanoseconds spent performing 0 L2C hits;
    0 nanoseconds spent performing 0 L2C misses;
    0 nanoseconds spent executing 0 flushes (flushing a total of 0 entities and 0 collections);
    0 nanoseconds spent executing 0 partial-flushes (flushing a total of 0 entities and 0 collections)
}
It took: 00:08:37.032 to execute query on DB using JPA Native query
Creating 14445 Smart Esg Scores records from DB result It took: 00:00:00.006

For the JDBC query I can see 1) that executing the query is quite fast, but 2) processing each ResultSet element in a loop takes the most of the time 00:09 seconds int total对于 JDBC 查询,我可以看到 1) 执行查询非常快,但是 2) 在循环中处理每个 ResultSet 元素花费的时间最多为 00:09 秒 int

On the other for the JPA Native query 1) executing the query by calling the query.getResultList() method takes a lot of time 10:14 seconds on the other hand 2) processing each result is quite fast here.另一方面,对于 JPA 原生查询 1) 通过调用 query.getResultList() 方法执行查询需要大量时间 10:14 秒另一方面 2) 在这里处理每个结果都非常快。 Analytics shows that a huge amount of time is spent on executing 1 JDBC statement.分析表明,执行 1 个 JDBC 语句花费了大量时间。 Even with FETCH_SIZE = 2000 nothing changed significantly.即使 FETCH_SIZE = 2000 也没有显着变化。

Why JPA Native is quite slow when compare with pure JDBC?为什么 JPA Native 与纯 JDBC 相比相当慢? Would it be the type conversions?会是类型转换吗? In my case I am talking about varchar2 and numbers.就我而言,我说的是 varchar2 和数字。 I was expecting identical results to JDBC.我期待与 JDBC 相同的结果。 but from 8 seconds to 10mins its a lot.但从 8 秒到 10 分钟它很多。

What can I do to improve the JPA Native query?我可以做些什么来改进 JPA 原生查询?

You seem to compare two different queries, quite possibly resulting in the database coming up with different query plans.您似乎比较了两个不同的查询,很可能导致数据库提出不同的查询计划。

There are lots of ways to investigate the problem but none of those are available to us because you don't provide a minimal reproducible example.有很多方法可以调查这个问题,但我们没有一个可用的方法,因为您没有提供最小的可重现示例。 I'll therefore suggest some options for you to investigate this yourself:因此,我会建议您自己进行一些调查:

  • Enable debug logging for your Java application including Hibernate and Oracle JDBC driver as explained in their documentation为您的 Java 应用程序启用调试日志记录,包括 Hibernate 和 Oracle Z82269B9B71AB71AB4A7732F698 驱动程序
  • Watch where the delay is coming from, is it the database, the network, or your Java application?观察延迟来自何处,是数据库、网络还是您的 Java 应用程序? If in doubt check the network traffic with Wireshark on both sides of the connection or check Oracles database statistics on slow/heavy queries before and after the problematic queries如果有疑问,请在连接两侧使用 Wireshark 检查网络流量,或在有问题的查询之前和之后检查有关慢/重查询的 Oracles 数据库统计信息
  • If the problem is a slow database make sure your query parameters have matching types with your database index如果问题是数据库速度慢,请确保您的查询参数具有与数据库索引匹配的类型
  • If you are sure the network and database are not causing the issue and debug logging doesn't help you further try using advanced tools like a cpu profiler with eg JVisualVM如果您确定网络和数据库没有导致问题,并且调试日志不能帮助您进一步尝试使用高级工具,例如带有 JVisualVM 的 cpu 分析器
  • If you're still having problems maybe you have some extreme memory problem like to little system memory causing swapping or very frequent Full Garbage Collection which you can see from Garbage Collection logging如果您仍然遇到问题,可能您有一些极端的 memory 问题,例如小系统 memory 导致交换或非常频繁的完整垃圾收集,您可以从垃圾收集日志中看到

Please note, that if you want to compare two concepts you must try to isolate the main feature and get rid of other factors, that may disturb the result.请注意,如果您想比较两个概念,您必须尝试隔离主要特征并排除其他因素,这可能会干扰结果。

So to see if the JDBC query and JPA native query differes in behaviour I'd propose following scenario:因此,要查看JDBC 查询和 JPA 本机查询的行为是否不同,我建议以下场景:

  • use only one query with the 1000 element list仅对 1000 个元素列表使用一个查询

  • use a plain table instead of a view使用普通表而不是视图

Here a simple setup to validate a performance.这里有一个简单的设置来验证性能。 The table has 50 rows for each GRP_ID resulting in getting 50K rows for 1000 keys (see below the script to setup the table)该表对于每个GRP_ID有 50 行,从而为 1000 个键获得 50K 行(请参阅下面的脚本来设置表)

List params = (13001L..14000L)
def query = session.createNativeQuery("select * from tab where grp_id in (:paramsList) ")
query.setFetchSize(2000)
query.setParameterList("paramsList", params);
result = query.getResultList();

Sample run shows this result示例运行显示此结果

 got 50000 rows in 1.388 seconds

So I thing there is no need to repeat the test with plain JDBC you will see a comparable result.所以我觉得没有必要用普通的 JDBC 重复测试,你会看到类似的结果。

What is more interesting is to repeat the run and remove the line更有趣的是重复运行并删除该行

query.setFetchSize(2000)

which will effectivelly reset the fetch size to the default (was 20 in my case), the result for the same data is这将有效地将获取大小重置为默认值(在我的情况下为 20),相同数据的结果是

 got 50000 rows in 1 minutes, 0.903 seconds

1) So the fetch size is the most propable explanation of the observed behaviour . 1)因此,获取大小是对观察到的行为最合理的解释 The important thing is to check, if the JDBC drive got the right value and uses it - in doubt you must use the 10046 trace to see what fetch size uses the database.重要的是检查 JDBC 驱动器是否获得正确的值并使用它 - 毫无疑问,您必须使用 10046 跟踪来查看使用数据库的获取大小。 But for me the above statement worked perfectly.但对我来说,上述陈述非常有效。

2) There is no substantial difference between a native JPA query and a manually written JDBC execute + fetch of a prepared statement that would explain your observation. 2)本机 JPA 查询和手动编写的 JDBC 执行 + 获取可以解释您的观察的准备语句之间没有实质性区别 Both perform execute of the statement in the database followed by a number of fetches - the count depend on the used fetch size两者都在数据库中执行语句,然后执行多次提取- 计数取决于使用的提取大小

3) Of course the view can also have influence, but it will be a difference in the query - not betwen the JDBC v. JPA . 3)当然视图也会有影响,但它会在查询中有所不同 - 不在 JDBC 与JPA 之间

4) You didn't mention it, so I'm not going in details here and assume your view doesn't contain any CLOB columns. 4)你没有提到它,所以我不会在这里详细说明并假设你的视图不包含任何CLOB列。 This could of course play a role.这当然可以发挥作用。

5) The last point is in your mention of two queries - do you use two independent queries or one query whith OR concatenated IN list? 5)最后一点是您提到了两个查询-您是使用两个独立的查询还是一个带有OR连接的 IN 列表的查询? You din not provide details so it's hard to comment.您不提供详细信息,因此很难发表评论。 Anyway two independent queries shoudl have no influence.无论如何,两个独立的查询应该没有影响。

Having said that one word of warning.说了这么一句警告的话。

The limitation of the IN list count has its purpose. IN 列表计数的限制有其目的。 It is acceptable for an ad Hoc script to use a large IN list selection, but for a regular running query this could be a parsing problem . ad Hoc 脚本使用大的 IN 列表选择是可以接受的,但对于常规运行的查询,这可能是一个解析问题 Why?为什么?

You use a bind variables to be able to consider the following quereis as a single statament (that is parsed only once)您使用绑定变量能够将以下 quereis 视为单个状态(仅解析一次)

select * from tab where ID = 1
select * from tab where ID = 2

which leads to这导致

select * from tab where ID = ?

But following two queries (with different length of the IN list) remains different and must be each extra parsed但是以下两个查询(IN 列表的长度不同)仍然不同,并且必须分别进行额外解析

select * from tab where ID in ( ? )
select * from tab where ID in ( ?, ? )

So re-thing if for your purpose with 30K rows+ the Hibernate is the best option因此,如果出于您的目的使用 30K 行+ Hibernate 是最佳选择,请重新考虑

Hibernate was designed to elegantly get gid of the need of using SQL which is by majority of developers considered a cool think (contrary to the majority of the DB people that have an opposite meaning;). Hibernate 旨在优雅地满足使用 SQL的需要,大多数开发人员认为这是一个很酷的想法(与大多数具有相反含义的 DB 人相反;)。

This concept works fine, the simple the use case is the better.这个概念很好用,用例越简单越好。 On the other side for batch processing it is sometimes better to approach it directly with SQL另一方面,对于批处理,有时最好直接使用 SQL

Test Data测试数据

create table tab as 
select 
rownum id,
trunc(rownum /  50) +1 grp_id,
rpad('x',100,'y') pad
from dual connect by level <= 1000000;
create index idx on tab(grp_id);

JDBC is generally faster then JPA, but in JPA you can benefit from caching and this way get better performance. JDBC 通常比 JPA 更快,但在 JPA 中,您可以从缓存中受益,并且这种方式可以获得更好的性能。

I dont know the purpose of this query and how is used (reporting?), but you should consider to use different criteria then just list of so many ids.我不知道此查询的目的以及如何使用(报告?),但您应该考虑使用不同的标准,然后只列出这么多 id。 I doubt some user have chosen 1000+ ids manually, so i guess they are chosen batch by some other criteria.我怀疑有些用户手动选择了 1000 多个 id,所以我猜他们是通过其他一些标准选择的。 Try to use this creatia instead.尝试改用这个 creatia。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM