简体   繁体   中英

SQL query returns different results when it is used in subquery

I am trying to understand why these two queries are returning different results. The first query is a sub-query in the second one. The first query is returning accurate results while the second one is not. I am primarily interested in the ACH_CODE and ACH_PRIOR identifiers.

The first query returns 0 and null respectively while the second returns 0 and 1. This is a huge problem since I eventually want to select accounts that are greater than 0 for one or the other identifiers. I have double checked the system and this account should not be in my population but is ending up in the population along with other similar accounts.

Query 1:

SELECT 
    ACCOUNT_NUMBER, MATURITY_DATE, SNAPSHOT_DATE, ACH_CODE, 
    LAG(ACH_CODE, 1) OVER(ORDER BY SNAPSHOT_DATE) AS ACH_PRIOR
FROM 
    PORTFOLIO T1
WHERE 
    SNAPSHOT_DATE = (DATE*)
    AND ACCOUNT_NUMBER = 81925169

Query 2:

SELECT 
    ACCOUNT_NUMBER, MATURITY_DATE, SNAPSHOT_DATE, ACH_CODE, ACH_PRIOR 
FROM 
    (SELECT 
         ACCOUNT_NUMBER, MATURITY_DATE, SNAPSHOT_DATE, ACH_CODE, 
         LAG(ACH_CODE, 1) OVER(ORDER BY SNAPSHOT_DATE) AS ACH_PRIOR
     FROM 
         PORTFOLIO T1
     WHERE 
         SNAPSHOT_DATE = (DATE*))
WHERE  
    ACCOUNT_NUMBER = 81925169

The 'DATE' in the query is actually another subquery that defines a particular date for each account number. The 'DATE' subquery is the same for both queries. I just removed it for ease of reading.

Any insight would be greatly appreciated.

In your first query the lag() only operates over rows matching the supplied account number. There is no previous row in the table for that account (matching the date from the subquery), so lag gets null.

In the second query it operates over all accounts. There is still no previous row for that account; but there is one for another account (which matches the date). The filter on account number is being applied after the lag has been evaluated. (Which value you get is indeterminate - it could be for any other account, or null, because you only order by date in the lag call.)

You can add a partition clause to restrict it to rows from the same account:

LAG(ACH_CODE, 1) OVER (PARTITION BY ACCOUNT_NUMBER ORDER BY SNAPSHOT_DATE)

db<>fiddle

Although, if snaphot date is unique for each account then it will always get null; and if it isn't the result is still indeterminate as you haven't specified how to break ties.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM