简体   繁体   English

Postgres UPDATE..FROM查询在同一行上有多个更新

[英]Postgres UPDATE..FROM query with multiple updates on the same row

I am attempting to optimise a bulk UPDATE statement in Postgres using the UPDATE..FROM syntax to update from a list of values. 我正在尝试使用UPDATE..FROM语法从值列表更新来优化Postgres中的批量UPDATE语句。 It works except when the same row might be updated more than once in the same query. 除了在同一查询中同一行可能被多次更新之外,其他方法均有效。

For example say I have a table "test" with columns "key" and "value". 例如说我有一个表“ test”,其中有“ key”和“ value”列。

update test as t set value = v.value from (values 
    ('key1', 'update1'), 
    ('key1', 'update2') ) 
    as v (key, value) 
where t.key = v.key;

My desired behavior is for the row with key 'key1' to be updated twice, finishing with value set to 'update2'. 我想要的行为是将键“ key1”的行更新两次,最后将值设置为“ update2”。 In practice sometimes the value is updated to update1 and sometimes to update2. 实际上,有时将值更新为update1,有时更新为update2。 Also an update trigger function on the table is only invoked once. 此外,表上的更新触发函数仅被调用一次。

The documentation ( http://www.postgresql.org/docs/9.1/static/sql-update.html ) explains why: 该文档( http://www.postgresql.org/docs/9.1/static/sql-update.html )解释了以下原因:

When a FROM clause is present, what essentially happens is that the target table is joined to the tables mentioned in the from_list, and each output row of the join represents an update operation for the target table. 当存在FROM子句时,本质上是将目标表连接到from_list中提到的表,并且连接的每个输出行都代表目标表的更新操作。 When using FROM you should ensure that the join produces at most one output row for each row to be modified. 使用FROM时,应确保该联接为要修改的每一行最多产生一个输出行。 In other words, a target row shouldn't join to more than one row from the other table(s). 换句话说,目标行不应与其他表的多个行连接。 If it does, then only one of the join rows will be used to update the target row, but which one will be used is not readily predictable. 如果是这样,那么将仅使用联接行之一来更新目标行,但是将很难预测将使用哪一行。

Because of this indeterminacy, referencing other tables only within sub-selects is safer, though often harder to read and slower than using a join. 由于存在这种不确定性,因此仅在子选择内引用其他表会更安全,尽管与使用联接相比,通常更难阅读,也更慢。

Is there any way to reformulate this query to achieve the behavior I'm looking for? 有什么方法可以重新构造此查询以实现我想要的行为? Does the reference to sub-selects in the documentation give a hint? 文档中对子选择的引用是否有提示?

Example (assuming id is a PK in the target table, and {id, date_modified} is a PK in the source table) 示例(假设id是目标表中的PK,而{id,date_modified}是源表中的PK)

UPDATE target dst
Set a = src.a , b = src.b
FROM source src
WHERE src.id = dst.id
AND NOT EXISTS (
        SELECT *
        FROM source nx
        WHERE nx.id = src.id
        -- use an extra key field AS tie-breaker
        AND nx.date_modified > src.date_modified
        );

(in fact, this is deduplication of the source table -> forcing the source table to the same PK as the target table) (实际上,这是源表的重复数据删除->强制源表与目标表具有相同的PK)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM