简体   繁体   English

Java中的SQL优化选项

[英]SQL optimization options in Java

Let's say I have a basic query like: 假设我有一个基本的查询:

SELECT a, b, c FROM x WHERE y=[Z]

In this query, [Z] is a "variable" with different values injected into the query. 在此查询中, [Z]是一个“变量”,其中不同的值被注入到查询中。

Now consider a situation where we want to do the same query with 2 known different values of [Z] , say Z1 and Z2 . 现在考虑一种情况,我们想用2个已知的[Z]不同值进行相同的查询,比如说Z1Z2 We can make two separate queries: 我们可以进行两个单独的查询:

SELECT a, b, c FROM x WHERE y=Z1

SELECT a, b, c FROM x WHERE y=Z2

Or perhaps we can programmatically craft a different query like: 或许我们可以通过编程方式制作不同的查询,例如:

SELECT a, b, c FROM x WHERE y in (Z1, Z2)

Now we only have one query (1 < 2), but the query construction and result set deconstruction becomes slightly more complicated, since we're no longer doing straightforward simple queries. 现在我们只有一个查询(1 <2),但查询构造和结果集解构变得稍微复杂一些,因为我们不再进行直接的简单查询。

Questions: 问题:

  • What is this kind of optimization called? 这种优化叫做什么? (Is it worth doing?) (值得做吗?)
  • How can it be implemented cleanly from a Java application? 如何从Java应用程序中干净地实现它?
    • Do existing Java ORM technologies help? 现有的Java ORM技术有帮助吗?

What is this kind of optimization called? 这种优化叫做什么?

I'm not sure if there is a "proper" term for it, but I've heard it called query batching or just plain batching. 我不确定它是否有一个“正确”的术语,但我听说它叫做查询批处理或只是简单的批处理。

(Is it worth doing?) (值得做吗?)

It depends on: 这取决于:

  • whether it is worth the effort optimizing the query at all, 是否值得努力优化查询,
  • the number of elements in the set; 集合中的元素数量; ie ... IN ( ... ) , ... IN ( ... )
  • the overheads of making a JDBC request versus the costs of query compilation, etc. 发出JDBC请求的开销与查询编译的成本等。

But in the right circumstances this is definitely a worthwhile optimization. 但在适当的情况下,这绝对是一个值得优化的。

How can it be implemented cleanly from a Java application? 如何从Java应用程序中干净地实现它?

It depends on your definition of "clean" :-) 这取决于你对“干净”的定义:-)

Do existing Java ORM technologies help? 现有的Java ORM技术有帮助吗?

It depends on the specific ORM technology you are talking, but (for example) the Hibernate HQL language supports the constructs that would allow you to do this kind of thing. 这取决于您正在讨论的特定ORM技术,但(例如)Hibernate HQL语言支持允许您执行此类操作的构造。

An RDBMS can normally return the result of a query with IN in equal or less time than it takes to execute two queries. RDBMS通常可以比执行两个查询所花费的时间更短或更短的时间内返回查询结果。

If there is no index on column Y, then a full table scan is required. 如果列Y上没有索引,则需要进行全表扫描。 With two queries, two table scans will be performed instead of one. 使用两个查询,将执行两次表扫描而不是一次。

If there is an index, then the single value in the WHERE clause, or the values in the IN list, are used one at a time to look up the index. 如果存在索引,则一次使用一个WHERE子句中的单个值或IN列表中的值来查找索引。 When some rows are found for one of the values in the IN list, they are added to the returned result. 当为IN列表中的某个值找到某些行时,它们将添加到返回的结果中。

So it is better to use the IN predicate from the performance point of view. 因此,从性能的角度来看,最好使用IN谓词。

When Y represents a column with unique values, then it is easy to decompose the result. 当Y表示具有唯一值的列时,则很容易分解结果。 Otherwise, there is slightly more work. 否则,工作稍微多一些。

老实说,如果你运行这两个准备好的查询(甚至使用普通的JDBC )而不是将它们与IN语句组合,你真的不能说有多少命中(如果有的话)。

If you have an array or List of values, you could manually build the prepare statement using JDBC: 如果您有数组或值列表,则可以使用JDBC手动构建prepare语句:

// Assuming values is an int[] and conn is a java.sql.Connection
// Also uses Apache Commons StringUtils

StringBuilder query = new StringBuilder("SELECT a, b, c FROM x WHERE y IN (");

query.append(StringUtils.join(Collections.nCopies(values.length, "?"), ',');
query.append(")");

PreparedStatement stmt = conn.prepareStatement(query.toString());

for (int i = 0; i < values.length; i++) {
    stmt.setInt(i + 1, values[i]);
}

stmt.execute();
// Get results after this

Note: I haven't actually tested this. 注意:我实际上没有测试过这个。 In theory, if you used this a lot, you'd generalize this and make it a method. 从理论上讲,如果你经常使用它,你可以概括它并使它成为一种方法。

Note that an "in" (where blah in ( 1, 5, 10 ) ) is the same as writing "where blah = 1 OR blah = 5 OR blah = 10". 注意,“in”(其中,(1,5,10)中的blah)与写“其中blah = 1或blah = 5或blah = 10”相同。 This is important if you are using, say, Apache Torque which would create lovely prepared statements except in the case of an "in" clause. 如果您使用的是Apache Torque,这将是非常重要的,它会创建可爱的预处理语句, 除非是“in”子句。 (That might be fixed by now.) (现在可能已经解决了。)

And the difference in performance that we found between the unprepared in clause and the prepared ORs was huge. 我们在毫无准备的条款和准备好的OR之间发现的性能差异是巨大的。

So a number of ORMs handle it, but not all of 'em handle it well. 因此,许多ORM会处理它,但并非所有的ORM都能很好地处理它。 Be sure to examine the queries sent to the database. 请务必检查发送到数据库的查询。

And while deconstructing the combined result set from a single query might be more difficult than handling a single result, it's probably a lot easier than trying to combine two result sets from two queries. 虽然从单个查询解构组合结果集可能比处理单个结果更困难,但它可能比尝试组合来自两个查询的两个结果集容易得多。 And probably significantly faster if a lot of duplicates are involved. 如果涉及大量重复,可能会明显加快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM