简体   繁体   中英

PostgreSQL 9.6 - INSERT ON CONFLICT DO UPDATE WHERE RETURNING

I have created and automated a custom workflow for eagerly synchronizing a series of materialized views. After trying several different approaches (for one to many relationships), I have found the most reliable synchronization workflow is to delete all records that may have been impacted and then insert the new records.

DELETE FROM
    some_materialized_view
WHERE
    set_of_records_key = some_value;

INSERT INTO
    some_materialized_view
SELECT
    *
FROM
    some_query_generating_some_materialized_view;

Note: some_query_generating_some_materialized_view is a complex read operation that takes a non-trivial amount of resources to execute. Additionally, some_materialized_view is heavily indexed with several foreign keys and other constraints.

This feels extremely heavy handed. This workflow comes with excessive delete and insert operations that are often times needless as some of the deleted records may have been identical, or similar enough to be a candidate for an UPDATE.

I would prefer something like the following:

DELETE FROM
    some_materialized_view
USING
    (
        SELECT
            unique_key
        FROM
            some_materialized_view
        WHERE
            set_of_records_key = some_value

        EXCEPT
        INSERT INTO
            some_materialized_view
        SELECT
            *
        FROM
            some_query_generating_some_materialized_view
        ON CONFLICT (...) DO UPDATE
        SET 
            foo = EXCLUDED.foo,
            bar = EXCLUDED.bar,
            ...
        WHERE
            some_materialized_view <> EXCLUDED
        RETURNING
            unique_key
    ) AS sub_query
WHERE
    some_materialized_view.unique_key = sub_query.unique_key;

The problem is in the ON CONFLICT ... DO UPDATE ... WHERE ... RETURNING clause.

as addressed in this question: How to use RETURNING with ON CONFLICT in PostgreSQL?

the RETURNING clause only returns impacted records. So records not impacted are not returned, and thus (in the example above) deleted inappropriately.

It seems the only way to get RETURNING to actually return all records is unnecessarily update identical records by removing the WHERE some_materialized_view <> EXCLUDED clause, or run some_query_generating_some_materialized_view again in another EXCEPT clause... both options are also not ideal.

So, what am I missing? Are there other options available? If not, in general , is it preferred to perform a complex, resource intensive, read operation over a needless UPDATE (remembering the associated index maintenance and check constraints)?

Note: I'm not including EXPLAIN ANALYZE results as this is not specific to a single query, rather a question in general. For the sake of maintainability and sanity this project needs to be consistent and this technique is used several times with tables of different structures and use cases (some read heavy, others write heavy).

  • Pseudo code (I don't like pseudo code)
  • demonstrating chained CTE
  • ignoring locking,serialisation and races.
  • (and semantics/context from the question)

WITH fresh AS ( -- workhorse: Called only and exactly once
        SELECT <keyfields> -- *
        FROM some_query_generating_some_materialized_view
        )
,upd AS ( -- update existing rows
        UPDATE some_materialized_view mv
        SET foo = fr.foo, bar = fr.bar
        FROM fresh fr
        WHERE mv.<keyfields = fr.<keyfields>
        RETURNING mv.<keyfields>
        )
/* 
, del AS ( 
        no deletes ???
        )
 */
        -- insert non existing rows.
INSERT INTO some_materialized_view mv ( <targetfields> )
SELECT fr.<srcfields>
FROM fresh fr
WHERE NOT EXISTS (
        SELECT *
        FROM upd nx
        WHERE nx.<keyfields> = fr.<keyfields>
        );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM