简体   繁体   中英

Oracle 12c - Efficient way to join max date record

I have the following join a table to the most recent record for a given EMPLOYE_ID and I am wondering if there is a more efficient/faster way of retrieving the most recent record, what would be the best way?

SELECT * FROM EMPLOYEE
WHERE NOT EXISTS (
                       SELECT 1
                       FROM EMPLOYEE D
                       JOIN EMPLOYEE_HISTORY E
                               ON  E.EMPLOYEE_ID = D.EMPLOYEE_ID
                               AND E.CREATE_DATE IN (SELECT MAX(CREATE_DATE) 
                                                   FROM EMPLOYEE_HISTORY 
                                                   WHERE EMPLOYEE_ID = D.EMPLOYEE_ID)
                  )

When I compared the explain plan to the following query it seems the below way is MORE costly.

SELECT *
FROM EMPLOYEE
WHERE NOT EXISTS 
    (SELECT 1
       FROM EMPLOYEE D
       JOIN   (
            SELECT  E.*
            FROM EMPLOYEE_HISTORY E 
            INNER JOIN  (
                            SELECT  EMPLOYEE_ID
                                ,   MAX(CREATE_DATE) max_date
                            FROM EMPLOYEE_HISTORY E2 
                            GROUP BY EMPLOYEE_ID
                            ) EE
                            ON  EE.EMPLOYEE_ID = E.EMPLOYEE_ID
                            AND EE.max_date = E.CREATE_DATE
              ) A
       ON  A.EMPLOYEE_ID = D.EMPLOYEE_ID 
       AND ROWNUM = 1)

So does that mean it is indeed better?

There is no index on CREATE_DATE, however the PK is on EMPLOYEE_ID, CREATE_DATE

I would write the query using = rather than IN :

 SELECT 1
 FROM EMPLOYEE E JOIN
      EMPLOYEE_HISTORY EH
      ON EH.EMPLOYEE_ID = E.EMPLOYEE_ID AND
         EH.CREATE_DATE = (SELECT MAX(EH2.CREATE_DATE) 
                           FROM EMPLOYEE_HISTORY EH2
                           WHERE EH2.EMPLOYEE_ID = EH.EMPLOYEE_ID
                          );

IN is more general than = for the comparison.

Your primary key index should be used for the subquery, which should make it pretty fast.

Assuming that you actually do want to return actual columns, then I'm not sure if there is a way to make this faster.

If you really are selecting only 1 , then forget the most recent record and just use EXISTS :

 SELECT 1
 FROM EMPLOYEE E
 WHERE EXISTS (SELECT 1
               FROM EMPLOYEE_HISTORY EH2
               WHERE EH2.EMPLOYEE_ID = E.EMPLOYEE_ID
              );

The only additional condition your query checks for is that CREATE_DATE is not NULL, but I'm guessing that is always true anyway.

Use the RANK (or DENSE_RANK or ROW_NUMBER ) analytic function:

SELECT 1
FROM EMPLOYEE E
JOIN   (
  SELECT *
  FROM   (
    SELECT  H.*,
            RANK() OVER ( PARTITION BY EMPLOYEE_ID ORDER BY CREATE_DATE DESC ) AS rnk
    FROM    EMPLOYEE_HISTORY H
  )
  WHERE rnk = 1
) H
ON  H.EMPLOYEE_ID = E.EMPLOYEE_ID

If the CREATE_DATE of the EMPLOYEE must be after the maximum CREATE_DATE for that EMPLOYEE_ID in EMPLOYEE_HISTORY?

Then for that EMPLOYEE_ID, there doesn't exist an equal or higher CREATE_DATE in EMPLOYEE_HISTORY.

SELECT * 
FROM EMPLOYEE Emp
WHERE NOT EXISTS (
    SELECT 1
    FROM EMPLOYEE_HISTORY Hist
    WHERE Hist.EMPLOYEE_ID = Emp.EMPLOYEE_ID
      AND Hist.CREATE_DATE >= Emp.CREATE_DATE
)

Test here

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM