简体   繁体   中英

SQL Optimization for DB2

If you have a situation where you are doing a Union All on two result sets, and the each result set is derived from an inner join with the same filtered subset of a master table does the query engine "hit" the master table once, or twice?

example:

SELECT m.col4, st1.col2
FROM master m
     INNER JOIN subTable1 st1
     on st1.col1 = m.col1
     WHERE m.col1 = 'a' and m.col2 = 123 and m.col3 = "a1b2"
UNION ALL
SELECT m.col4, st2.col2
FROM master m
     INNER JOIN subTable2 st2
     on st2.col1 = m.col1
     WHERE m.col1 = 'a' and m.col2 = 123 and m.col3 = "a1b2"

I am trying to determine if it would be beneficial to create a temp table to hold the filtered results of the master table so the UNION ALL statement would be working with a small subset of the master records, instead of having to perform the filtering of the master table twice, like it might be doing in the example above.

thank you, in advance, for whatever advice you can give.

Maybe a common table expression helps:

with small_master as (
   select m.col4,
          m.col1
   from master
   where m.col1 = 'a' 
     and m.col2 = 123 
     and m.col3 = 'a1b2'
)
SELECT m.col4, st1.col2
FROM small_master m
     INNER JOIN subTable1 st1
     on st1.col1 = m.col1
UNION ALL
SELECT m.col4, st2.col2
FROM small_master m
     INNER JOIN subTable2 st2
     on st2.col1 = m.col1;

In my experience (not with DB2 though) this helps if the CTE is reducing the number of rows drastically (say from "millions" to "thousands").

If the intermediate result of the CTE is (still) quite large (several millions) then this will probably not help.

But only the execution plan can shed light on this.

The easiest way to answer this kind of "what if" questions is to look at the query plan. You can easily generate one with the command db2expln -d <your db> -f <your query file> -z <your query delimiter> -gi

Generally speaking, if a task can be done with a single SQL statement that will be the fastest way to accomplish the task, so it is unlikely that creating a temporary table will benefit performance.

This depends a lot on the database and the statistics of the tables involved. I am not intimately familiar with DB2.

However, if the issue is performance, then consider putting an index on master(col, col2, col3) . This would speed up both parts of the query.

The use of a CTE as a temp table is highly database specific. Postgres always instantiates CTEs, so the code is only run once. SQL Server never does. I do not know the behavior of DB2 in this regards. However, I would prefer to add indexes to explicitly improve performance, rather than fiddling with the query -- your new query may result in unexpected query plans based when table statistics change, new software is released, or hardware is upgraded.

As references for SQL Server behavior, you might be interested in this one or this one or this discussion .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM