简体   繁体   中英

Picking minValue and its row in hive

I have to pick the minValue over a sliding date window of 2 hours and its corresponding date value. For example

Create table stock(time string, cost float);

Insert into stock values("1990-01-01 8:00 AM",4.5);
Insert into stock values("1990-01-01 9:00 AM",3.2);
Insert into stock values("1990-01-01 10:00 AM",3.1);
Insert into stock values("1990-01-01 11:00 AM",5.5);
Insert into stock values("1990-01-02 8:00 AM",5.1);
Insert into stock values("1990-01-02 9:00 AM",2.2);
Insert into stock values("1990-01-02 10:00 AM",1.5);
Insert into stock values("1990-01-02 11:00 AM",6.5);
Insert into stock values("1990-01-03 8:00 AM",8.1);
Insert into stock values("1990-01-03 9:00 AM",3.2);
Insert into stock values("1990-01-03 10:00 AM",2.5);
Insert into stock values("1990-01-03 11:00 AM",4.5);

For this I can write a query like this

select min(cost) over(order by unix_timestamp(time) range between current row and 7200 following)
from stock

So, from the current row look ahead by 2 hours(7200 seconds) and pick the min so for the First row the min value will be 3.1 located in the Third row at 10:00 AM I get the right min value with this query, but I also need the corresponding date value for the min value, in this case, I want "1990-01-01 10:00 AM". How can I pick this?

Thanks, Raj

I think this is a hard problem. One approach is to join to find the value:

select s.*
from (select s.*,
             min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
      from stock s
     ) s join
     stock smin
     on smin.cost = min_cost and
        unix_timestamp(smin.time) >= unix_timestamp(s.time) and
        unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200

The downside with this approach is that it might produce duplicates. If that is an issue:

select s.*
from (select s.*, smin.time as min_time,
             row_number() over (partition by s.time order by smin.time) as seqnum
      from (select s.*,
                   min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
            from stock s
           ) s join
           stock smin
           on smin.cost = min_cost and
              unix_timestamp(smin.time) >= unix_timestamp(s.time) and
              unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
       ) s
where seqnum = 1;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM