I was searching on how to get the latest occurences based on col1 and col2.
Let's suppose we have the following table (all rows needed are marked with *):
col1 col2 col3
---------------------------------------------------------
002478 ABC 2019-08-23 *
002478 ABC 2019-05-14
002588 CVMG 2019-01-07 *
002588 IP 2019-01-31 *
002588 MMG 2019-09-04 *
002588 MMG 2019-08-28
002588 NUSA 2019-11-04 *
002588 NUSA 2019-04-24
002746 IE 2019-01-15 *
003467 IE 2020-01-10
003467 IE 2020-03-13 *
I was able to get the latest occurences based on col1 and col2 with the following select.
SELECT t.col1,
t.col2,
t.col3
FROM
teste t
WHERE t.col3 IN (SELECT max(a.col3)
FROM teste a
WHERE a.col1 = t.col1 AND a.col2 = t.col2)
In this example, it only takes about 10 ~ 7 ms
to complete, but on my real database, it takes about 1 hour
.
I removed all JOINS
that I could and the minimum time I've reached was about 55 minutes
.
As I'm using Progress, there's no window function
(that I'm aware of) like partition by
.
There's another way to solve this problem? The only query I could think was on that "style".
Here's an SQL Fiddle with that example database.
Another way of writing the same query is to select the rows for which not excist a newer related row:
SELECT t.col1, t.col2, t.col3
FROM teste t
WHERE NOT EXISTS
(
SELECT NULL
FROM teste t_newer
WHERE t_newer.col1 = t.col1
AND t_newer.col2 = t.col2
AND t_newer.col3 > t.col3
);
This may be faster or slower or equally fast. This depends on how your DBMS runs this internally.
With either of the two queries the DBMS faces the task to quickly look up other rows with the same col1 and col2. With only the table, the DBMS would have to sequentially read it again and again and again. This is where indexes come into play. You provide the DBMS with indexes, where it can look up where in the table are the matching rows.
In your case you want an index an col1 and col2, in order to provide a means to find the related rows. And you can also add col3, as this must be compared, too. Maybe it doesn't matter whether to start the index with col1 or col2, maybe it does. How many different col1 are in the table, how many different col2? If one has just 5 different values and the other 5,000, then start the index with the one with 5,000 values, because for one value you will find fewer rows, ie get faster to the rows you are interested in.
An index could then look like
create index idx on teste (col1, col2, col3);
The queries stay the same. The DBMS will look at your query and decide whether to use an index or not. For the given queries I am sure it will use the index mentioned, because the queries are all about quickly looking up related rows.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.