简体   繁体   中英

Optimizing an Oracle group by/aggregate query

I have a query I would like to optimize. This is the query:

SELECT CONN.connNum, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
FROM INTER
INNER JOIN CONN ON (INTER.IDConn_FK = CONN.IDConn)
GROUP BY INTER.IDConn_FK, CONN.connNum;

These are the explain plan results:

------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
| Id  | Operation                     | Name           | Rows  | Bytes | Cost (%CPU)| Time     |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------                                                                                                                                                                                                             
|   0 | SELECT STATEMENT              |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   1 |  HASH GROUP BY                |                |     3 |   171 |     7  (15)| 00:00:01 |                                                                                                                                                                                                             
|   2 |   NESTED LOOPS                |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   3 |    NESTED LOOPS               |                |     3 |   171 |     6   (0)| 00:00:01 |                                                                                                                                                                                                             
|   4 |     TABLE ACCESS FULL         | INTER          |     3 |    78 |     3   (0)| 00:00:01 |                                                                                                                                                                                                             
|*  5 |     INDEX UNIQUE SCAN         | SYS_C002012172 |     1 |       |     0   (0)| 00:00:01 |                                                                                                                                                                                                             
|   6 |    TABLE ACCESS BY INDEX ROWID| CONN           |     1 |    31 |     1   (0)| 00:00:01 |                                                                                                                                                                                                             
------------------------------------------------------------------------------------------------  

I've tried using more specific SELECTs, but the results are the same (something like FROM (SELECT IDConn_FK, walkingDistanceMinutes FROM INTER) I etc). Can you please show me a way to get the cost down?

It was very useful to know if IDConn_FK and connNum were unique on their table because this changes lots of things.

  1. If they're both unique on their tables, you wouldn't need to group results because there wouldn't be multiple occurrences of the same value for connNum. So, in this case, one optimizations would be to not group by because there is only a single value of walkingDistanceMinutes corresponding to each connNum. Removing an unneeded group by would be the right optimization here.

  2. If just connNum is unique on CONN, then one way to optimize this query may be to limit the size of the resources needed to sort the elements during the MIN evaluation. This can be done using a subquery that will also limit the number of rows involved in the join. Here you can use query #1

  3. If only IDConn_FK is unique then the query is fine as it is. Query #2 may help you a little, but not really much.

  4. If none of the two columns is unique, you can always try to limit the number of rows involved in the join through a subquery like for case #2, but you will also need to re-evaluate the MIN once more because you need it corresponding to connNum(that relies on table CONN). Don't think that grouping twice will be more expensive than doing it at once: this is a sort of divide-et-impera approach(separate a complex problem into more simple problems and the recombine their results together to get the solution for the complex problem). Here you could use query #2.

Query #1:

SELECT CONN.connNum, minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)

Query #2

SELECT CONN.connNum, MIN(INTER.minimalWalkingDistance) AS minimalWalkingDistance
FROM (
        select INTER.IDConn_FK as IDConn, MIN(INTER.walkingDistanceMinutes) AS minimalWalkingDistance
        from INTER
        GROUP BY INTER.IDConn_FK
    ) inter
    JOIN CONN using (IDConn)
group by CONN.connNum

And last one more thing to know: don't always consider execution plan cost as God's word, there are many times where queries with high cost are more efficient than others with lower cost. Especially when there are a high number of joins and aggregations.

For your size of data, there is no real optimization possible. For larger data, Oracle should choose other execution paths. You might try this:

select c.connNum,
       (select min(i.walkingDistanceMinutes
        from inter i
        where i.IDConn_FK = c.idConn
      ) as minimalWalkingDistance
from conn c ;

I'm not 100% sure this is exactly the same query. I'm assuming that idConn is the primary key on the conn table.

Create a unique index on Conn (IDConn, connNum).

This should remove the last live off the query plan as the index can satisfy all needed columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM