Say we have a table:
CREATE TABLE p
(
id serial NOT NULL,
val boolean NOT NULL,
PRIMARY KEY (id)
);
Populated with some rows:
insert into p (val)
values (true),(false),(false),(true),(true),(true),(false);
ID VAL 1 1 2 0 3 0 4 1 5 1 6 1 7 0
I want to determine when the value has been changed. So the result of my query should be:
ID VAL 2 0 4 1 7 0
I have a solution with joins and subqueries:
select min(id) id, val from
(
select p1.id, p1.val, max(p2.id) last_prev
from p p1
join p p2
on p2.id < p1.id and p2.val != p1.val
group by p1.id, p1.val
) tmp
group by val, last_prev
order by id;
But it is very inefficient and will work extremely slow for tables with many rows.
I believe there could be more efficient solution using PostgreSQL window functions?
This is how I would do it with an analytic:
SELECT id, val
FROM ( SELECT id, val
,LAG(val) OVER (ORDER BY id) AS prev_val
FROM p ) x
WHERE val <> COALESCE(prev_val, val)
ORDER BY id
Update (some explanation):
Analytic functions operate as a post-processing step. The query result is broken into groupings ( partition by
) and the analytic function is applied within the context of a grouping.
In this case, the query is a selection from p
. The analytic function being applied is LAG
. Since there is no partition by
clause, there is only one grouping: the entire result set. This grouping is ordered by id
. LAG
returns the value of the previous row in the grouping using the specified order. The result is each row having an additional column (aliased prev_val) which is the val
of the preceding row. That is the subquery.
Then we look for rows where the val
does not match the val
of the previous row (prev_val). The COALESCE
handles the special case of the first row which does not have a previous value.
Analytic functions may seem a bit strange at first, but a search on analytic functions finds a lot of examples walking through how they work. For example: http://www.cs.utexas.edu/~cannata/dbms/Analytic%20Functions%20in%20Oracle%208i%20and%209i.htm Just remember that it is a post-processing step. You won't be able to perform filtering, etc on the value of an analytic function unless you subquery it.
Instead of calling COALESCE
, you can provide a default from the window function lag()
directly. A minor detail in this case since all columns are defined NOT NULL
. But this may be essential to distinguish "no previous row" from "NULL in previous row".
SELECT id, val
FROM (
SELECT id, val, OVER (ORDER BY id)
FROM p
) sub
WHERE changed
ORDER BY id;
Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster.
If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe NULL
values - even though your columns are defined NOT NULL
. Basic lag()
returns NULL
in case there is no previous row:
SELECT id, val
FROM (
SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed
FROM p
) sub
WHERE changed
ORDER BY id;
Or employ the additional parameters of lag()
once again:
SELECT id, val
FROM (
SELECT id, val, OVER (ORDER BY id)
FROM p
) sub
WHERE changed
ORDER BY id;
As proof of concept. :) Performance won't keep up with posted alternatives.
WITH RECURSIVE cte AS (
SELECT id, val
FROM p
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id < p.id
)
UNION ALL
SELECT p.id, p.val
FROM cte
JOIN p ON p.id > cte.id
AND p.val <> cte.val
WHERE NOT EXISTS (
SELECT 1
FROM p p0
WHERE p0.id > cte.id
AND p0.val <> cte.val
AND p0.id < p.id
)
)
SELECT * FROM cte;
With an improvement from @wildplasser.
SQL Fiddle demonstrating all.
Can even be done without window functions.
SELECT * FROM p p0
WHERE EXISTS (
SELECT * FROM p ex
WHERE ex.id < p0.id
AND ex.val <> p0.val
AND NOT EXISTS (
SELECT * FROM p nx
WHERE nx.id < p0.id
AND nx.id > ex.id
)
);
UPDATE: Self-joining a non-recursive CTE (could also be a subquery instead of a CTE)
WITH drag AS (
SELECT id
, rank() OVER (ORDER BY id) AS rnk
, val
FROM p
)
SELECT d1.*
FROM drag d1
JOIN drag d0 ON d0.rnk = d1.rnk -1
WHERE d1.val <> d0.val
;
This nonrecursive CTE approach is surprisingly fast, although it needs an implicit sort.
Using 2 row_number()
computations : This is also possible to do with usual "islands and gaps" SQL technique (could be useful if you can't use lag()
window function for some reason:
with cte1 as (
select
*,
row_number() over(order by id) as rn1,
row_number() over(partition by val order by id) as rn2
from p
)
select *, rn1 - rn2 as g
from cte1
order by id
So this query will give you all islands
ID VAL RN1 RN2 G
1 1 1 1 0
2 0 2 1 1
3 0 3 2 1
4 1 4 2 2
5 1 5 3 2
6 1 6 4 2
7 0 7 3 4
You see, how G
field could be used to group this islands together:
with cte1 as ( select *, row_number() over(order by id) as rn1, row_number() over(partition by val order by id) as rn2 from p ) select min(id) as id, val from cte1 group by val, rn1 - rn2 order by 1
So you'll get
ID VAL
1 1
2 0
4 1
7 0
The only thing now is you have to remove first record which can be done by getting min(...) over()
window function:
with cte1 as (
...
), cte2 as (
select
min(id) as id,
val,
min(min(id)) over() as mid
from cte1
group by val, rn1 - rn2
)
select id, val
from cte2
where id <> mid
And results:
ID VAL
2 0
4 1
7 0
A simple inner join can do it. SQL Fiddle
select p2.id, p2.val
from
p p1
inner join
p p2 on p2.id = p1.id + 1
where p2.val != p1.val
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.