简体   繁体   中英

PostgreSQL - column value changed - select query optimization

Say we have a table:

CREATE TABLE p
(
   id serial NOT NULL, 
   val boolean NOT NULL, 
   PRIMARY KEY (id)
);

Populated with some rows:

insert into p (val)
values (true),(false),(false),(true),(true),(true),(false);
ID  VAL
1   1
2   0
3   0
4   1
5   1
6   1
7   0

I want to determine when the value has been changed. So the result of my query should be:

ID  VAL
2   0
4   1
7   0

I have a solution with joins and subqueries:

select min(id) id, val from
(
  select p1.id, p1.val, max(p2.id) last_prev
  from p p1
  join p p2
    on p2.id < p1.id and p2.val != p1.val
  group by p1.id, p1.val
) tmp
group by val, last_prev
order by id;

But it is very inefficient and will work extremely slow for tables with many rows.
I believe there could be more efficient solution using PostgreSQL window functions?

SQL Fiddle

This is how I would do it with an analytic:

SELECT id, val
  FROM ( SELECT id, val
           ,LAG(val) OVER (ORDER BY id) AS prev_val
       FROM p ) x
  WHERE val <> COALESCE(prev_val, val)
  ORDER BY id

Update (some explanation):

Analytic functions operate as a post-processing step. The query result is broken into groupings ( partition by ) and the analytic function is applied within the context of a grouping.

In this case, the query is a selection from p . The analytic function being applied is LAG . Since there is no partition by clause, there is only one grouping: the entire result set. This grouping is ordered by id . LAG returns the value of the previous row in the grouping using the specified order. The result is each row having an additional column (aliased prev_val) which is the val of the preceding row. That is the subquery.

Then we look for rows where the val does not match the val of the previous row (prev_val). The COALESCE handles the special case of the first row which does not have a previous value.

Analytic functions may seem a bit strange at first, but a search on analytic functions finds a lot of examples walking through how they work. For example: http://www.cs.utexas.edu/~cannata/dbms/Analytic%20Functions%20in%20Oracle%208i%20and%209i.htm Just remember that it is a post-processing step. You won't be able to perform filtering, etc on the value of an analytic function unless you subquery it.

Window function

Instead of calling COALESCE , you can provide a default from the window function lag() directly. A minor detail in this case since all columns are defined NOT NULL . But this may be essential to distinguish "no previous row" from "NULL in previous row".

SELECT id, val
FROM  (
   SELECT id, val,  OVER (ORDER BY id) 
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Compute the result of the comparison immediately, since the previous value is not of interest per se, only a possible change. Shorter and may be a tiny bit faster.

If you consider the first row to be "changed" (unlike your demo output suggests), you need to observe NULL values - even though your columns are defined NOT NULL . Basic lag() returns NULL in case there is no previous row:

SELECT id, val
FROM  (
   SELECT id, val, lag(val) OVER (ORDER BY id) IS DISTINCT FROM val AS changed
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Or employ the additional parameters of lag() once again:

SELECT id, val
FROM  (
   SELECT id, val,  OVER (ORDER BY id) 
   FROM   p
   ) sub
WHERE  changed
ORDER  BY id;

Recursive CTE

As proof of concept. :) Performance won't keep up with posted alternatives.

WITH RECURSIVE cte AS (
   SELECT id, val
   FROM   p
   WHERE  NOT EXISTS (
      SELECT 1
      FROM   p p0
      WHERE  p0.id < p.id
      )

   UNION ALL
   SELECT p.id, p.val
   FROM   cte
   JOIN   p ON p.id   > cte.id
           AND p.val <> cte.val
   WHERE NOT EXISTS (
     SELECT 1
     FROM   p p0
     WHERE  p0.id   > cte.id
     AND    p0.val <> cte.val
     AND    p0.id   < p.id
     )
  )
SELECT * FROM cte;

With an improvement from @wildplasser.

SQL Fiddle demonstrating all.

Can even be done without window functions.

SELECT * FROM p p0
WHERE EXISTS (
        SELECT * FROM p ex
        WHERE ex.id < p0.id
        AND ex.val <> p0.val
        AND NOT EXISTS (
                SELECT * FROM p nx
                WHERE nx.id < p0.id
                AND nx.id > ex.id
                )
        );

UPDATE: Self-joining a non-recursive CTE (could also be a subquery instead of a CTE)

WITH drag AS (
        SELECT id
        , rank() OVER (ORDER BY id) AS rnk
        , val
        FROM p
        )
SELECT d1.*
FROM drag d1
JOIN drag d0 ON d0.rnk = d1.rnk -1
WHERE d1.val <> d0.val
        ;

This nonrecursive CTE approach is surprisingly fast, although it needs an implicit sort.

Using 2 row_number() computations : This is also possible to do with usual "islands and gaps" SQL technique (could be useful if you can't use lag() window function for some reason:

with cte1 as (
    select
        *,
        row_number() over(order by id) as rn1,
        row_number() over(partition by val order by id) as rn2
    from p
)
select *, rn1 - rn2 as g
from cte1
order by id

So this query will give you all islands

ID VAL RN1 RN2  G
1   1   1   1   0
2   0   2   1   1
3   0   3   2   1
4   1   4   2   2
5   1   5   3   2
6   1   6   4   2
7   0   7   3   4

You see, how G field could be used to group this islands together:

with cte1 as ( select *, row_number() over(order by id) as rn1, row_number() over(partition by val order by id) as rn2 from p ) select min(id) as id, val from cte1 group by val, rn1 - rn2 order by 1

So you'll get

ID VAL
1   1
2   0
4   1
7   0

The only thing now is you have to remove first record which can be done by getting min(...) over() window function:

with cte1 as (
   ...
), cte2 as (
    select
        min(id) as id,
        val,
        min(min(id)) over() as mid
    from cte1
    group by val, rn1 - rn2
)
select id, val
from cte2
where id <> mid

And results:

ID VAL
2   0
4   1
7   0

A simple inner join can do it. SQL Fiddle

select p2.id, p2.val
from
    p p1
    inner join
    p p2 on p2.id = p1.id + 1
where p2.val != p1.val

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM