Hive fetch previous row with Maximum value for a column

Question

I am trying to execute query in Hive.I have an Item table ,each of it has multiple dates associated with it.I want to retrieve find difference between each row's value with previous date value for each row.

ID        DATE           VALUE
1         01-01-2014       10
1         03-01-2014       05
1         07-01-2014       40
1         05-01-2014       20
2         05-01-2014       10

I would like to have output of the form :

ID        DATE           VALUE
1         01-01-2014       10
1         03-01-2014       -5
1         05-01-2014       15
1         07-01-2014       20
2         05-01-2014       10

I tried the following query.

SELECT C.ID ,C.DATE,C.VALUE AS CURRENT_DATE_VALUE,COALESCE(CAST(O.VALUE AS INT),0) AS PREV_DATE_VALUE,(C.VALUE-COALESCE(CAST(O.VALUE as INT),0)) AS DIFF_VALUE 
FROM ITEM O 
LEFT OUTER JOIN 
( SELECT T.ID ,C.DATE,C.VALUE,MAX(UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy')) AS PREV_DATE 
  FROM ITEM C 
  LEFT OUTER JOIN ITEM T ON(C.ID = T.ID) WHERE   
  UNIX_TIMESTAMP (C.DATE,'dd-MM-yyyy') > UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy') GROUP BY
  T.ID ,C.DATE,C.VALUE) C 
ON (O.ID = C.ID AND UNIX_TIMESTAMP (O.DATE,'dd-MM-yyyy') = C.PREV_DATE)

This query couldn't fetch row which do not have row for previous date. Anyone can help me with this using self joins as I'm using hive version that does not support windowing functions ?

Any help would be appreciated.

Answer 1

First - Create Table, Load Data to HIVE

use tmp ;
create table t_time(id string,td string,value int) row format delimited fields terminated by '\t' ;
LOAD DATA LOCAL INPATH '/home/hadoop/b.txt' INTO TABLE t_time;

Second - Try below SQL:("if" is very important method)

select t1.id,t1.td,if(t2.td is null,t1.value,t1.value - t2.value)
from (
select a.id,a.td,max(if(b.td <a.td,b.td,null)) pre_td,a.value
from t_time a join t_time b
on (a.id = b.id)
group by a.id,a.td,a.value
) t1 left outer join t_time t2
on (t1.id = t2.id and t1.pre_td = t2.td)

Result

id          td          _c2
1       01-01-2014      10
1       03-01-2014      -5
1       05-01-2014      15
1       07-01-2014      20
2       05-01-2014      10

Hive fetch previous row with Maximum value for a column

Question

1 answers

solution1
0 2014-02-13 09:57:21

Hive fetch previous row with Maximum value for a column

Question

1 answers

solution1 0 2014-02-13 09:57:21

solution1
0 2014-02-13 09:57:21