如果子查询在 Postgres 中返回多行，则跳过行

Question

我想根据 Postgres 中的子查询结果更新表prod_replay_out 。 但是，子查询返回多行，但我想跳过这些行并根据子查询返回的单行更新表。

我已经提到链接子查询返回超过 1 行错误，但max() function 不适用于我的预期结果。 你能给我一些修改查询的建议吗？ 谢谢你。

prod_replay_out具有以下列：

卖家、买家、sender_tag、seller_tag、buyer_tag、isin、数量、in_msg_time、msg_type、cdsx_time

prod_replay_in具有以下列：

卖家，买家，sender_tag，seller_tag，buyer_tag，isin，数量，msg_type，cdsx_time

我试过什么？

请在下面找到更新 sql：

更新 sql：

update prod_replay_out O  
   set in_msg_id = 
        (Select id
           From prod_replay_in I
          Where I.msg_type   = 'CDST010'
            and I.seller     = O.seller
            and I.buyer      = O.buyer
            and I.sender_tag = O.sender_tag
            and I.seller_tag = O.seller_tag
            and I.buyer_tag  = O.buyer_tag
            and I.isin       = O.isin
            and I.quantity   = O.quantity
            and I.cdsx_time  = O.in_msg_time
            and I.cdsx_time::text like '2020-05-12%'
         ) 
where O.msg_type = 'CDST01C'
and O.cdsx_time::text like '2020-05-12%';

我尝试了以下解决方案。 这是正确的方法还是有任何漏洞？

update prod_replay_out O  
   set in_msg_id = 
        (Select id
           From prod_replay_in I
          Where I.msg_type   = 'CDST010'
            and I.seller     = O.seller
            and I.buyer      = O.buyer
            and I.sender_tag = O.sender_tag
            and I.seller_tag = O.seller_tag
            and I.buyer_tag  = O.buyer_tag
            and I.isin       = O.isin
            and I.quantity   = O.quantity
            and I.cdsx_time  = O.in_msg_time
            and I.cdsx_time::text like '2020-05-12%'
            and 1 = (Select count(id)
                       From prod_replay_in I
                      Where I.msg_type   = 'CDST010'
                        and I.seller     = O.seller
                        and I.buyer      = O.buyer
                        and I.sender_tag = O.sender_tag
                        and I.seller_tag = O.seller_tag
                        and I.buyer_tag  = O.buyer_tag
                        and I.isin       = O.isin
                        and I.quantity   = O.quantity
                        and I.cdsx_time  = O.in_msg_time
                        and I.cdsx_time::text like '2020-05-12%'
                    )    
                )
where O.msg_type = 'CDST01C'
  and O.cdsx_time::text like '2020-05-12%';

Answer 1

询问

最重要的是，不要使用相关子查询。 这是这项工作的劣质工具。 在FROM子句中使用子查询。

这仅更新在源表中找到单个匹配候选行的位置（既没有也没有多个），并且仅更新实际更改值的位置：

UPDATE prod_replay_out o
SET    in_msg_id = i.id
FROM  (
   SELECT i.id, i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time
   FROM   prod_replay_in i
   WHERE  i.msg_type   = 'CDST010'
   AND    i.cdsx_time >= '2020-05-12'     -- ① "sargable" expression
   AND    i.cdsx_time <  '2020-05-13'     -- ② don't cast to date, it's a valid timestamp literal
   AND    NOT EXISTS (                    -- ③ EXISTS is typically faster than counting
      SELECT FROM prod_replay_in x
      WHERE  x.id <> i.id                 -- ④ unique
      AND   (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)  -- ⑤ short syntax
        =   (x.seller, x.buyer, x.sender_tag, x.seller_tag, x.buyer_tag, x.isin, x.quantity, x.cdsx_time)
      )
   ) i
WHERE  o.msg_type = 'CDST01C'
AND   (i.seller, i.buyer, i.sender_tag, i.seller_tag, i.buyer_tag, i.isin, i.quantity, i.cdsx_time)
  =   (o.seller, o.buyer, o.sender_tag, o.seller_tag, o.buyer_tag, o.isin, o.quantity, o.in_msg_time)  -- ⑥ o.cdsx_time?
-- AND    o.cdsx_time >= '2020-05-12'     -- ⑦ redundant
-- AND    o.cdsx_time <  '2020-05-13'
AND   o.in_msg_id IS DISTINCT FROM i.id   -- ⑧ avoid empty updates
;

① 像 GMB 已经建议的那样，将这个谓词转换为“sargable”表达式。 这通常更快，并且可以使用索引支持。

②但是如果cdsx_time是timestamp列，则不要转换为date （看起来很可能）。 '2020-05-12'是一个完全有效的时间戳文字，表示当天的第一个实例。 看：

在 PostgreSQL 中的两个日期之间生成时间序列

如果它是timestamptz列，请考虑timezone设置的可能影响：请参阅：

在 Rails 和 PostgreSQL 中完全忽略时区

③ EXISTS通常比计算所有行更有效，因为它可以在找到另一行时立即停止。 特别是如果可以有很多对等点，并且可以使用索引支持。 看：

Select 其他表中不存在的行

④ 假设id是唯一的（或 PK）。 否则将系统列ctid用于作业。 看：

我如何（或我可以）在多列上使用 SELECT DISTINCT？

⑤ 方便的、与 ROW 值等效的短句法。 看：

对多列比较执行索引扫描

⑥ 您的查询有：

and I.cdsx_time  = O.in_msg_time         -- !?
and I.cdsx_time::text like '2020-05-12%'

... 但：

O.cdsx_time::text like '2020-05-12%'

您不是要写and I.cdsx_time = O.cdsx_time吗？

⑦ 会是噪音。 该限制已在子查询中实施。 （也无助于索引支持。）

⑧ 如果某些列可能已经具有所需的值，这一点很重要。 然后跳过该操作，而不是全额编写相同的行版本。

如果两列都定义为NOT NULL ，则简化为o.in_msg_id <> i.id 。 再次，请参阅：

用 PostgreSQL 中另一个表的列更新一个表的列

指数

如果性能是一个问题或者您重复运行它，请考虑如下索引：

对于识别源行候选的第一步（按预期查询计划的顺序：）：

CREATE INDEX foo ON prod_replay_in (msg_type, cdsx_time);

排除重复的第二步：

CREATE INDEX foo ON prod_replay_in (seller, buyer, sender_tag, seller_tag, buyer_tag, isin, quantity, cdsx_time);

或任何具有足够选择性的小子集。 如果在索引扫描中包含相对较少的附加行作为“误报”，则在较少列上的较小索引通常更有效。 虽然相对较少，但在接下来的FILTER步骤中可以廉价地消除它们。

对于识别目标行的最后一步：

CREATE INDEX foo ON prod_replay_out (msg_type, in_msg_time);

再次：或任何具有足够选择性的小子集。

Answer 2

您只想在子查询返回一行时更新。 一种选择在子查询中使用聚合和having ：

update prod_replay_out o  
set in_msg_id = (
    select max(id)
    from prod_replay_in i
    where 
        i.msg_type       = 'cdst010'
        and i.seller     = o.seller
        and i.buyer      = o.buyer
        and i.sender_tag = o.sender_tag
        and i.seller_tag = o.seller_tag
        and i.buyer_tag  = o.buyer_tag
        and i.isin       = o.isin
        and i.quantity   = o.quantity
        and i.cdsx_time  = o.in_msg_time
        and i.cdsx_time  >= '2020-05-12'::date
        and i.cdsx_time  <  '2020-05-13'::date
    having count(*) = 1
) 
where 
    o.msg_type = 'cdst01c'
    and o.cdsx_time  >= '2020-05-12'::date
    and o.cdsx_time  <  '2020-05-13'::date

请注意，我重写了日期过滤器以避免转换为文本（您可以使用带有日期文字的半开间隔，这样效率更高）。

请注意，当子查询将返回多行（或根本没有行）时，这in_msg_id in_msg_id 更新为null 。 如果你想避免这种情况，你可以在where子句中过滤：

update prod_replay_out o  
set in_msg_id = (
    select max(id)
    from prod_replay_in i
    where 
        i.msg_type       = 'cdst010'
        and i.seller     = o.seller
        and i.buyer      = o.buyer
        and i.sender_tag = o.sender_tag
        and i.seller_tag = o.seller_tag
        and i.buyer_tag  = o.buyer_tag
        and i.isin       = o.isin
        and i.quantity   = o.quantity
        and i.cdsx_time  = o.in_msg_time
        and i.cdsx_time  >= '2020-05-12'::date
        and i.cdsx_time  <  '2020-05-13'::date
    having count(*) = 1
) 
where 
    o.msg_type = 'cdst01c'
    and o.cdsx_time  >= '2020-05-12'::date
    and o.cdsx_time  <  '2020-05-13'::date
    and (
        select count(*)
        from prod_replay_in i
        where 
            i.msg_type       = 'cdst010'
            and i.seller     = o.seller
            and i.buyer      = o.buyer
            and i.sender_tag = o.sender_tag
            and i.seller_tag = o.seller_tag
            and i.buyer_tag  = o.buyer_tag
            and i.isin       = o.isin
            and i.quantity   = o.quantity
            and i.cdsx_time  = o.in_msg_time
            and i.cdsx_time  >= '2020-05-12'::date
            and i.cdsx_time  <  '2020-05-13'::date
    ) = 1

如果子查询在 Postgres 中返回多行，则跳过行

问题描述

2 个解决方案

解决方案1
4 2020-06-26 14:02:54

询问

指数

解决方案2
3 已采纳 2020-06-25 12:39:07

如果子查询在 Postgres 中返回多行，则跳过行

问题描述

2 个解决方案

解决方案1 4 2020-06-26 14:02:54

询问

指数

解决方案2 3 已采纳 2020-06-25 12:39:07

解决方案1
4 2020-06-26 14:02:54

解决方案2
3 已采纳 2020-06-25 12:39:07