So, I have a postgresql table which keep appending new records with different item
item period cost cost_diff
---------------------------------------------------------
bag 2019-03-15T18:15:00.000Z 100 0
shoe 2019-03-15T18:15:00.000Z 200 0
So, when records come in, their cost_diff will be 0. But when the new one come like this
item period cost cost_diff
---------------------------------------------------------
bag 2019-03-15T18:15:00.000Z 100 0
shoe 2019-03-15T18:15:00.000Z 200 0
bag 2019-03-15T18:30:00.000Z 150 0
shoe 2019-03-15T18:45:00.000Z 300 0
The cost_diff of the old record will be updated by using (new cost - old cost) but it will be updated if and only if the period is the next 15 minutes in which the data will be insert at time of 0, 15,30 and 45 minute.
item period cost cost_diff
---------------------------------------------------------
bag 2019-03-15T18:15:00.000Z 100 50 (150-100)
shoe 2019-03-15T18:15:00.000Z 200 0 (no update)
bag 2019-03-15T18:30:00.000Z 150 0
shoe 2019-03-15T18:45:00.000Z 300 0
Table above shows that the newer record for bag that has 15 minutes range (18:15->18:30) are inserted so the bag row with period of 18:15 will update the cost_diff column to 50 from the cost from 18:30 minus with cost from 18:15 which will be 150 - 50 = 100. While the old shoe row will not be updated (still 0) because the newer shoe record that come in is not the next 15 minute (18:15->18:45) and it will be update when the shoe row with period of 18:30 insert in the table and so on for other records as well(there are many item, not just show and bag as shown).
So, how can I create a query base on this problem, because of the record will keep coming into this table, can this be done purely using sql query or do I need to use python to help with this (I am doing an etl pipeline in which this task include in the transform process)
Thank you
You can do this with a query. Use lead()
:
select t.*,
(case when lead(period) over (partition by item order by period) < period + interval '15 minute'
then lead(cost) over (partition by item order by period) - cost
else 0
) as cost_diff
from t;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.