简体   繁体   中英

Postgresql: update old record when the new one is inserted with time condition

So, I have a postgresql table which keep appending new records with different item

item      period                     cost    cost_diff
---------------------------------------------------------
 bag    2019-03-15T18:15:00.000Z     100         0
 shoe   2019-03-15T18:15:00.000Z     200         0

So, when records come in, their cost_diff will be 0. But when the new one come like this

item      period                     cost    cost_diff
---------------------------------------------------------
 bag    2019-03-15T18:15:00.000Z     100         0
 shoe   2019-03-15T18:15:00.000Z     200         0
 bag    2019-03-15T18:30:00.000Z     150         0
 shoe   2019-03-15T18:45:00.000Z     300         0

The cost_diff of the old record will be updated by using (new cost - old cost) but it will be updated if and only if the period is the next 15 minutes in which the data will be insert at time of 0, 15,30 and 45 minute.

item      period                     cost    cost_diff
---------------------------------------------------------
 bag    2019-03-15T18:15:00.000Z     100        50 (150-100)
 shoe   2019-03-15T18:15:00.000Z     200         0 (no update)
 bag    2019-03-15T18:30:00.000Z     150         0
 shoe   2019-03-15T18:45:00.000Z     300         0

Table above shows that the newer record for bag that has 15 minutes range (18:15->18:30) are inserted so the bag row with period of 18:15 will update the cost_diff column to 50 from the cost from 18:30 minus with cost from 18:15 which will be 150 - 50 = 100. While the old shoe row will not be updated (still 0) because the newer shoe record that come in is not the next 15 minute (18:15->18:45) and it will be update when the shoe row with period of 18:30 insert in the table and so on for other records as well(there are many item, not just show and bag as shown).

So, how can I create a query base on this problem, because of the record will keep coming into this table, can this be done purely using sql query or do I need to use python to help with this (I am doing an etl pipeline in which this task include in the transform process)

Thank you

You can do this with a query. Use lead() :

select t.*,
       (case when lead(period) over (partition by item order by period) < period + interval '15 minute'
             then lead(cost) over (partition by item order by period) - cost
             else 0
       ) as cost_diff
from t;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM