I have a table that contains numbers. I have to find whether there is any case where in n consecutive numbers are greater than some threshold value m. For eg
id delta
---------------
1 10
4 15
11 22
23 23
46 21
57 9
So here, if I want to know if there are 3 consecutive records where value is more than 20 then I should get True. And False when I check for 4 consecutive records. Is that possible? This is on Apache Spark SQL. Thanks.
You can do this using lag:
select t.*
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
This returns the first row that is part of each three-some. If you just want true/false:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2
from t
) t
where val > 20 and val_1 > 20 and val_2 > 20;
EDIT:
I missed the part about not wanting more than 3. So, you can enhance this:
select (case when count(*) > 0 then 'true' else 'false' end)
from (select t.*,
lag(val, 1) over (order by id) as val_1,
lag(val, 2) over (order by id) as val_2,
lag(val, 3) over (order by id) as val_3,
lead(val, 1) over (order by id) as val_next_1
from t
) t
where (val_3 <= 20 or val_3 is null) and
(val_2 > 20 and val_1 > 20 and val > 20) and
(val_next_1 <= 20 or val_next_1 is null);
It is a little tricky because the values can be at the beginning or end of the rows.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.