简体   繁体   中英

SQL Hive - Calculate the rolling SUM,AVG for previous months

I need to calculate the sum of amt for the past 6 and 12 months for the ID & Dt tried using over partition by and case statements but not getting the expected o/p.

Id             dt            amt
11112222222    2018-03-01    100
11112222222    2018-03-01    100
**11112222222    2017-03-01    100**
11112222222    2017-09-01    100
11112222222    2017-03-01    300
11112222222    2018-01-01    100
11112222222    2018-05-01    200
**11112222222    2016-03-01    450**
11112222222    2018-04-01    500 

O/P:

Id               Dt       Sum6mon    Avg6mon    Sum12mon    Avg12months
11112222222    2018-03-01    400     150        1500           187.5
11112222222    2017-03-01    100     100        550             275 
etc...

date format - YYYY-MM-DD

In case if you need the amt sum and average per every distinct {ID, date} pair, you may try the following query:

   SELECT
   Id,
   dt,
   SUM(CASE WHEN dt >= from_unixtime(unix_timestamp() - 3600 * 24 * 30 * 6, 'yyyy-MM-dd') THEN amt ELSE 0 END) as Sum6mon,  
   SUM(amt) as Sum12mon,
   AVG(CASE WHEN dt >= from_unixtime(unix_timestamp() - 3600 * 24 * 30 * 6, 'yyyy-MM-dd') THEN amt ELSE 0 END) as Avg6mon,  
   AVG(amt) as Avg12mon          
   FROM  
   <your table name>
   WHERE   
   dt BETWEEN from_unixtime(unix_timestamp() - 3600 * 24 * 30 * 12, 'yyyy-MM-dd') AND from_unixtime(unix_timestamp(), 'yyyy-MM-dd')  
   GROUP BY Id,dt  
   ;

But you have to keep in mind that the combination from_unixtime(unix_timestamp()) is quite slow, so try to replace it with static dates whenever possible

tried using over partition by

Yes, for things like rolling sum/average analytic functions are recommended. I think the best is to use range windowing clause instead of case when :

select id, dt, amt,
       sum(amt) over (partition by id order by dt range interval '6'  month preceding) s06,
       avg(amt) over (partition by id order by dt range interval '6'  month preceding) a06,
       sum(amt) over (partition by id order by dt range interval '12' month preceding) s12,
       avg(amt) over (partition by id order by dt range interval '12' month preceding) a12
  from t order by dt

SqlFiddle demo

If you provide sample data please also attach matching expected output. And in this case it's impossible that sum for day 2018-03-01 was 1500, all previous rows gives 1250. Matching output allows us verify our results and react :) Also it's helpful to show your code and efforts.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM