简体   繁体   中英

How to return the same period last year data with SQL?

I am trying to create a view in postgreSQL with the requirements as below:

The table needs to show the same period last year data for every records.

Sample data:

date_sk  | location_sk | division_sk | employee_type_sk | value
20180202 | 6           | 8           | 4                | 1
20180202 | 7           | 2           | 4                | 2
20190202 | 6           | 8           | 4                | 1
20190202 | 7           | 2           | 4                | 1
20200202 | 6           | 8           | 4                | 1
20200202 | 7           | 2           | 4                | 3

In the table, date_sk, location_sk, division_sk and employee_type_sk are super keys which form an unique record in the table. You can check the required output as below:

date_sk  | location_sk | division_sk | employee_type_sk | value     | value_last_year
20180202 | 6           | 8           | 4                | 1         | NULL
20180203 | 7           | 2           | 4                | 2         | NULL
20190202 | 6           | 8           | 4                | 1         | 1
20190203 | 7           | 3           | 4                | 1         | NULL
20200202 | 6           | 8           | 4                | 1         | 1
20200203 | 7           | 3           | 4                | 3         | 1

The records start on 20180202, therefore, the data for the same period last year is unavailable. At the 4th record, there is a difference in division_sk comparing with the same period last year - hence, the head_count_last_year is NULL. My current solution is to create a view from the sample data with an addition column as same_date_last_year then LEFT JOIN the same table. The SQL queries are below:

CREATE VIEW test_view AS
SELECT *, 
    CONCAT(LEFT(date_sk, 4) - 1, RIGHT(date_sk, 4)) AS same_date_last_year 
FROM test_table 

SELECT
    test_view.date_sk,
    test_view.location_sk,
    test_view.division_sk,
    test_view.employee_type_sk,
    test_view.value,
    test_table.value AS value_last_year
FROM test_view
LEFT JOIN test_table ON (test_view.same_date_last_year  = test_table.date_sk)

We have a lot of data in the table. My solution above is unacceptable in terms of performance. Is there a different query which yields the same result and might improve the performance ?

You could simply use a correlated subquery here which is likely best for performance:

select *, 
    (
        select value from t t2 
        where t2.date_sk=t.date_sk - interval '1' year and
          t2.location_sk=t.location_sk and
          t2.division_sk=t.division_sk and
          t2.employee_type_sk=t.employee_type_sk
    ) as value_last_year
from t
WITH CTE(DATE_SK,LOCATION_SK,DIVISION_SK,EMPLOYEE_TYPE_SK,VALUE)AS
 (
    SELECT CAST('20180202' AS DATE),6,8,4,1 UNION ALL 
    SELECT CAST('20180203'AS DATE),7,2,4,2 UNION ALL 
    SELECT CAST('20190202'AS DATE),6,8,4,1 UNION ALL 
    SELECT CAST('20190203'AS DATE),7,2,4,1 UNION ALL 
    SELECT CAST('20200202'AS DATE),6,8,4,1 UNION ALL 
    SELECT CAST('20200203'AS DATE),7,2,4,3  
 )
 SELECT C.DATE_SK,C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK,C.VALUE,
    LAG(C.VALUE)OVER(PARTITION BY C.LOCATION_SK,C.DIVISION_SK,C.EMPLOYEE_TYPE_SK    ORDER BY C.DATE_SK ASC)LAGG
     FROM CTE AS C
  ORDER BY C.DATE_SK ASC;

Could you please try if the above is suitable for you. I assume,DATE_SK is a date column or can be CAST to a date

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM