简体   繁体   中英

How to get difference in value over a sliding time window?

I'm attempting to write a SQL query which returns every product where the most recent price on an order within the last 30 days is different than the most recent price in the previous 30 days, and that calculated variance. I'm currently using PostgreSQL 11.

Data Model

Right now, the data is structured into three tables: orders , products , and a pivot table, order_product . Here is the simplified version of the table structure:

Orders

id order_date
1 2022-01-15
2 2022-02-15
3 2022-03-08

Products

id name
1 Some product
2 Another product
3 Yet another product

Order_Product

order_id product_id unit_price
1 1 10
1 2 20
1 3 10
2 1 12
2 2 20
2 3 5
3 1 15

Desired Output

The desired output would be something like the following:

id name order_date latest_unit_price previous_unit_price variance
1 Some product 2022-03-08 15 10 5
3 Yet another product 2022-02-15 5 10 -5

What I've done so far

I've been able to write a join that combines the Orders and Products via the order_product table, within the 60-day window, which is seemingly the easy part:

SELECT
    "products"."id",
    "products"."name",
    "order_product"."unit_price",
    "orders"."order_date"
FROM
    products
    JOIN order_product ON products.id = order_product.product_id
    JOIN orders ON order_product.order_id = orders.id
WHERE
    order_date BETWEEN now() - INTERVAL '60 days'
    AND now()

I've been trying to work with RANK() and LAG() ; however, where I'm getting stuck is being able to find the rank the rows within the 30-day time windows, and then calculate the variance between the two windows.

Any help would be much appreciated!

Update: Added solution

Building off of the answer by D-Shih , I had to tweak this to work based on the time window starting from the current date:

WITH CTE AS (
    SELECT
        "products"."id",
        "products"."name",
        "order_product"."unit_price",
        "orders"."order_date"
    FROM
        products
        JOIN order_product ON products.id = order_product.product_id
        JOIN orders ON order_product.order_id = orders.id
    WHERE
        order_date BETWEEN now() - INTERVAL '60 days' AND now()
),
CTE2 AS (
    SELECT
        *,
        EXTRACT(DAYS FROM now() - order_date :: timestamp) gap_days
    FROM
        CTE
),
CTE3 AS (
    SELECT
        *,
        (CASE WHEN gap_days < 30 THEN 1 ELSE 0 END) grp
    FROM
        CTE2
)
SELECT
    id,
    name,
    MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
    MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
    MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
    SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM
    (
        SELECT
            *,
            ROW_NUMBER() OVER (PARTITION BY ID, grp ORDER BY order_date DESC) rn
        FROM
            CTE3
    ) t1
WHERE
    rn = 1
GROUP BY
    id,
    name
HAVING
    MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END)

sqlfiddle

You can try to use EXTRACT with LAG window function to get days difference from order_date and previous order_date each productId .

Then use SUM aggregate condition window function to calculate the group

  • grp = 0 within the last 30 days
  • grp = 1 most recent price in the previous 30 days,

the query would be look like as below.

WITH CTE AS (
 SELECT  "products"."id",
    "products"."name",
    "order_product"."unit_price",
    "orders"."order_date"
 FROM
    products
    JOIN order_product ON products.id = order_product.product_id
    JOIN orders ON order_product.order_id = orders.id
 WHERE
    order_date BETWEEN now() - INTERVAL '60 days'
    AND now()
), CTE2 AS (
  SELECT *,EXTRACT(DAYS FROM  order_date - LAG(order_date,1,order_date) OVER(PARTITION BY id ORDER BY order_date)) gap_seconds
  FROM CTE 
), CTE3 AS (
  SELECT *,(CASE WHEN SUM(gap_seconds) OVER(PARTITION BY id ORDER BY order_date) > 30 THEN 1 ELSE 0 END) grp
  FROM CTE2
)
SELECT id,
       name,
       MAX(CASE WHEN grp = 1 THEN order_date END) order_date,
       MAX(CASE WHEN grp = 1 THEN unit_price END) latest_unit_price,
       MAX(CASE WHEN grp = 0 THEN unit_price END) previous_unit_price,
       SUM(CASE WHEN grp = 1 THEN unit_price ELSE - unit_price END) variance
FROM (
  SELECT *,ROW_NUMBER() OVER(PARTITION BY ID,grp ORDER BY order_date DESC) rn
  FROM CTE3
) t1
WHERE rn = 1
GROUP BY id,
         name
HAVING MAX(CASE WHEN grp = 1 THEN unit_price END) <> MAX(CASE WHEN grp = 0 THEN unit_price END) 

sqlfiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM