Picking minValue and its row in hive

Question

I have to pick the minValue over a sliding date window of 2 hours and its corresponding date value. For example

Create table stock(time string, cost float);

Insert into stock values("1990-01-01 8:00 AM",4.5);
Insert into stock values("1990-01-01 9:00 AM",3.2);
Insert into stock values("1990-01-01 10:00 AM",3.1);
Insert into stock values("1990-01-01 11:00 AM",5.5);
Insert into stock values("1990-01-02 8:00 AM",5.1);
Insert into stock values("1990-01-02 9:00 AM",2.2);
Insert into stock values("1990-01-02 10:00 AM",1.5);
Insert into stock values("1990-01-02 11:00 AM",6.5);
Insert into stock values("1990-01-03 8:00 AM",8.1);
Insert into stock values("1990-01-03 9:00 AM",3.2);
Insert into stock values("1990-01-03 10:00 AM",2.5);
Insert into stock values("1990-01-03 11:00 AM",4.5);

For this I can write a query like this

select min(cost) over(order by unix_timestamp(time) range between current row and 7200 following)
from stock

So, from the current row look ahead by 2 hours(7200 seconds) and pick the min so for the First row the min value will be 3.1 located in the Third row at 10:00 AM I get the right min value with this query, but I also need the corresponding date value for the min value, in this case, I want "1990-01-01 10:00 AM". How can I pick this?

Thanks, Raj

Answer 1

I think this is a hard problem. One approach is to join to find the value:

select s.*
from (select s.*,
             min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
      from stock s
     ) s join
     stock smin
     on smin.cost = min_cost and
        unix_timestamp(smin.time) >= unix_timestamp(s.time) and
        unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200

The downside with this approach is that it might produce duplicates. If that is an issue:

select s.*
from (select s.*, smin.time as min_time,
             row_number() over (partition by s.time order by smin.time) as seqnum
      from (select s.*,
                   min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
            from stock s
           ) s join
           stock smin
           on smin.cost = min_cost and
              unix_timestamp(smin.time) >= unix_timestamp(s.time) and
              unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
       ) s
where seqnum = 1;

Picking minValue and its row in hive

Question

1 answers

solution1
2 ACCPTED 2019-11-29 22:19:36

Picking minValue and its row in hive

Question

1 answers

solution1 2 ACCPTED 2019-11-29 22:19:36

solution1
2 ACCPTED 2019-11-29 22:19:36