简体   繁体   中英

How to use Hive sql's lag function with two columns?

I feel like I have a fairly simple SQL problem to solve, just don't know how to search for it correctly.

Say I have a table where values are updated based on time:

|timestamp|value|session|
|---------|-----|-------|
| ts1     | v1  |  s1   |
| ts2     | v2  |  s1   |
| ts3     | v3  |  s1   |
| ...     | ..  |  s2   |

I want to get current value and previous value with associated timestamps .

So the result should be:

|timestamp_current|value_current|timestamp_prev|value_prev|
|-----------------|-------------|--------------|----------|
|      ts2        |      v2     |    ts1       |    v1    |
|      ts3        |      v3     |    ts2       |    v2    |
|      ...        |      ..     |    ...       |    ..    |

I think the following query is correct if I was only looking to get the previous value, not the previous timestamp:

select timestamp, value, lag(value,1) over (partition by (session) order by timestamp) from mytable

However, what is the correct way of adding TWO values from the previous row, do I add two lag clauses or is there a better way?

You can derive your result by using lag() twice; once for prev_timestamp and once for prev_val as below.

select * from
(
select timestamp, 
       value, 
       lag(timestamp) over(partition by session order by timestamp) as prev_timestamp, 
       lag(value) over(partition by session order by timestamp) as prev_value
from table1
) t
where prev_timestamp is not null

where clause is used to exclude rows having prev_timestamp as NULL

Result:

+-----------+-------+----------------+------------+
| timestamp | value | prev_timestamp | prev_value |
+-----------+-------+----------------+------------+
| ts2       | v2    | ts1            | v1         |
| ts3       | v3    | ts2            | v2         |
+-----------+-------+----------------+------------+

DEMO

One another way to achieve this would be to use the row_number function and joining the records as shown below.. But using two lag methods in the same query will mostly likely be more performant than row_number and left join method.

WITH dt AS (SELECT  timestamp, value, ROW_NUMBER() OVER(PARTITION BY session ORDER BY timestamp) as row_num FROM table1)
SELECT 
  t0.timestamp, 
  t0.value, 
  t1.timestamp as prev_timestamp,
  t1.value as prev_value
FROM dt t0 
LEFT OUTER JOIN dt t1
ON t0.row_num = t1.row_num - 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM