简体   繁体   中英

Validating optimization of an Oracle query

Ok, so I'm working on this (rather old) project at work, which uses loads of queries towards an Oracle database. I recently stumbled upon this gem, which takes about 6-7 hours to run and returns ~1400 rows. The table/view in question contains ~200'000 rows. I thought this felt like it was taking maybe a little longer than seemed reasonable, so I started having a closer look at it. Now I can't, for security/proprietary reasons, share the exact query, but this should show what the query does in more general terms:

SELECT
    some_field,
    some_other_field
FROM (
    SELECT
        *
    FROM
        some_view a
    WHERE
        some_criteria AND
        a.client_no || ':' || a.engagement_no || ':' || a.registered_date = (
            SELECT
                b.client_no || ':' || b.engagement_no || ':' || MAX(b.registered_date)
            FROM
                some_view b
                JOIN some_engagement_view e
                    ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
                JOIN some_client_view c
                    ON c.client_no = b.client_no
            WHERE
                some_other_criteria AND
                b.client_no = a.client_no AND
                b.engagement_no = a.engagement_no
            GROUP BY
                b.client_no,
                b.engagement_no
        )
);

Basically what it is supposed to do, as far as I've managed to figure out, is to from some_view (which contains evaluations of customers/engagements) fetch the latest evaluation for every unique client/engagement.

The two joins are there to ensure that the client and engagement exists in another system, where they are primarily handled after you have done the evaluation in this system.

Notice how it concatenates two numbers and a date, and then compares that to a sub-query? "Interesting" design-choice. So I thought that if you replace the concatenation with a proper comparison you might get some kind of performance gain at least. Please notice that I primarily develop .NET and for the web, and am far from an expert when it comes to databases, but I rewrote it as follows:

SELECT
    some_field,
    some_other_filed
FROM
    some_view a
WHERE
    some_criteria AND
    (a.client_no, a.engagement_no, a.registered_date) = (
        SELECT
            b.client_no,
            b.engagement_no,
            MAX(b.registered_date)
        FROM
            some_view b
            JOIN some_engagement_view e
                ON e.client_no = b.client_no AND e.engagement_no = b.engagement_no
            JOIN some_client_view c
                ON c.client_no = b.client_no
        WHERE
            some_other_criteria AND
            b.client_no = a.client_no AND
            b.engagement_no = a.engagement_no
        GROUP BY
            b.client_no,
            b.engagement_no
    )
);

Now if I replace the fields in the very first select with a COUNT(1) , I get exactly the same number of rows with both queries, so a good start. The new query fetches data just as fast as it counts, < 10 seconds. The old query gets the count in ~20 seconds, and as I mentioned before, the data takes close to 6-7 hours. It is currently running so that I can do some kind of analysis to see if the new query is valid, but I thought that I'd ask here as well to see if there is anything apparently wrong that I have done?

EDIT Also removed the outer-most query, which did not seem to fulfill any kind of purpose, except maybe making the query look cooler.. or something.. I dunno..

Expanding on my comment... if I try to replicate your query structure using built-in views it also runs for a long time. For example, getting the most recently created table for each owner (purely for demo purposes, it can be done more simply) like this takes several minutes, with either version:

SELECT
    owner,
    object_name
FROM
    all_objects a
WHERE
    (a.owner, a.object_type, TRUNC(a.created)) = (
        SELECT
            b.owner, b.object_type, TRUNC(MAX(b.created))
        FROM
            all_objects b
            JOIN all_tables e
                ON e.owner = b.owner and e.table_name = b.object_name
            JOIN all_users c
                ON c.username = b.owner
        WHERE
            b.owner = a.owner AND
            b.object_type = a.object_type
        GROUP BY
            b.owner,
            b.object_type
    );

If I rewrite that to avoid the self-join on all_objects (equivalent to some_view in your example) by using an analytic function instead:

SELECT
    owner,
    object_name
FROM (
    SELECT
        a.owner,
        a.object_name,
        row_number() over (partition by a.owner, a.object_type
            order by a.created desc) as rn
    FROM
        all_objects a
        JOIN all_tables e
            ON e.owner = a.owner and e.table_name = a.object_name
        JOIN all_users c
            ON c.username = a.owner
    )
WHERE
    rn = 1;

... then it takes a few seconds.

Now, in this case I don't get exactly the same output because I have multiple objects created at the same time (within the same second as far as created is concerned).

I don't know the precision of the values stored in your registered_date of course. So you might need to look at different functions, possibly rank rather than row_number , or adjust the ordering to deal with ties if necessary.

        rank() over (partition by a.owner, a.object_type
            order by trunc(a.created) desc) as rn
...
WHERE
    rn = 1;

gives me the same results (well, almost; the join to all_tables is also skewing things, as I seem to have tables listed in all_objects that aren't in all_tables , but that's a side issue). Or max could work too:

        max(created) over (partition by a.owner, a.object_type) as mx
...
WHERE
    TRUNC(created) = TRUNC(mx)

In both of those I'm using trunc to get everything on the same day; you may not need to if your registered_date doesn't have a time component.

But of course, check you do actually get the same results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM