I am trying to execute query in Hive.I have an Item table ,each of it has multiple dates associated with it.I want to retrieve find difference between each row's value with previous date value for each row.
ID DATE VALUE
1 01-01-2014 10
1 03-01-2014 05
1 07-01-2014 40
1 05-01-2014 20
2 05-01-2014 10
I would like to have output of the form :
ID DATE VALUE
1 01-01-2014 10
1 03-01-2014 -5
1 05-01-2014 15
1 07-01-2014 20
2 05-01-2014 10
I tried the following query.
SELECT C.ID ,C.DATE,C.VALUE AS CURRENT_DATE_VALUE,COALESCE(CAST(O.VALUE AS INT),0) AS PREV_DATE_VALUE,(C.VALUE-COALESCE(CAST(O.VALUE as INT),0)) AS DIFF_VALUE
FROM ITEM O
LEFT OUTER JOIN
( SELECT T.ID ,C.DATE,C.VALUE,MAX(UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy')) AS PREV_DATE
FROM ITEM C
LEFT OUTER JOIN ITEM T ON(C.ID = T.ID) WHERE
UNIX_TIMESTAMP (C.DATE,'dd-MM-yyyy') > UNIX_TIMESTAMP(T.DATE,'dd-MM-yyyy') GROUP BY
T.ID ,C.DATE,C.VALUE) C
ON (O.ID = C.ID AND UNIX_TIMESTAMP (O.DATE,'dd-MM-yyyy') = C.PREV_DATE)
This query couldn't fetch row which do not have row for previous date. Anyone can help me with this using self joins as I'm using hive version that does not support windowing functions ?
Any help would be appreciated.
First - Create Table, Load Data to HIVE
use tmp ;
create table t_time(id string,td string,value int) row format delimited fields terminated by '\t' ;
LOAD DATA LOCAL INPATH '/home/hadoop/b.txt' INTO TABLE t_time;
Second - Try below SQL:("if" is very important method)
select t1.id,t1.td,if(t2.td is null,t1.value,t1.value - t2.value)
from (
select a.id,a.td,max(if(b.td <a.td,b.td,null)) pre_td,a.value
from t_time a join t_time b
on (a.id = b.id)
group by a.id,a.td,a.value
) t1 left outer join t_time t2
on (t1.id = t2.id and t1.pre_td = t2.td)
Result
id td _c2
1 01-01-2014 10
1 03-01-2014 -5
1 05-01-2014 15
1 07-01-2014 20
2 05-01-2014 10
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.