简体   繁体   English

在大表上使用连接更新 - 性能提示?

[英]Update using join on big table - performance tips?

Been struggling with this update, that never finishes一直在为这个更新而苦苦挣扎,永远不会完成

update votings v
set voter_id = (select pv.number from voters pv WHERE pv.person_id = v.person_id);

Table being currently has 96M records当前表有 96M 条记录

select count(0) from votings;
  count   
----------
 96575239
(1 registro)

Update apparently is using index更新显然正在使用索引

explain update votings v                             
set voter_id = (select pv.number from voters pv WHERE pv.rl_person_id = v.person_id);
                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Update on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
   ->  Seq Scan on votings v  (cost=0.00..788637465.40 rows=91339856 width=1671)
         SubPlan 1
           ->  Index Scan using idx_voter_rl_person_id on voters pv  (cost=0.56..8.58 rows=1 width=9)
                 Index Cond: (rl_person_id = v.person_id)
(5 registros)

Here are the indexes I have for votings这是我的投票索引

Índices:
    "votings_pkey" PRIMARY KEY, btree (id)
    "votings_election_id_voter_id_key" UNIQUE CONSTRAINT, btree (election_id, person_id)
    "votings_external_id_external_source_key" UNIQUE CONSTRAINT, btree (external_id, external_source)
    "idx_votings_updated_at" btree (updated_at DESC)
    "idx_votings_vote_party" btree (vote_party)
    "idx_votings_vote_state_vote_party" btree (vote_state, vote_party)
    "idx_votings_voter_id" btree (person_id)
Restrições de chave estrangeira:
    "votings_election_id_fkey" FOREIGN KEY (election_id) REFERENCES elections(id)
    "votings_voter_id_fkey" FOREIGN KEY (person_id) REFERENCES people_all(id)

Guys, any ideia who plays the biggest part on the update running slowly?伙计们,谁在更新运行缓慢方面发挥最大作用? the number of rows or the join being used?行数或正在使用的连接?

One suggestion I can make here would be to use a covering index for the subquery lookup:我可以在这里提出的一个建议是对子查询查找使用覆盖索引:

CREATE INDEX idx_cover ON voters (person_id, number);

While in the context of a select this might not advantage much over your current index on person_id alone, in the context of an update it might matter more.虽然在 select 的上下文中,这可能不会比您当前在person_id上的索引有太大优势,但在更新的上下文中,它可能更重要。 The reason is that for an update this index might relieve Postgres from having to create and maintain a copy of the original table in its state before the update.原因是对于更新,此索引可能会减轻 Postgres 在更新之前必须在其 state 中创建和维护原始表的副本。

If you actually have 91339856 rows in voting , the 91339856 index scans on voters are certainly the dominant cost factor.如果您在voting中实际有 91339856 行,那么对voters的 91339856 次索引扫描肯定是主要的成本因素。 The sequential scan will be faster.顺序扫描会更快。

You can probably boost performance if you don't force PostgreSQL to do a nested loop join:如果您不强制 PostgreSQL 进行嵌套循环连接,您可能会提高性能:

UPDATE votings
SET voter_id = voters.number
FROM voters
WHERE votings.person_id = voters.person_id;

Updating all the rows in the table is going to be really expensive.更新表中的所有行将非常昂贵。 I would suggest re-creating the table:我建议重新创建表:

create temp_votings as
    select v.*, vv.vote_id
    from votings v join
         voters vv
         on vv.person_id = v.person_id;

For this query, you want an index on votes(person_id, vote_id) .对于此查询,您需要一个关于votes(person_id, vote_id)的索引。 I am guessing that person_id might already be the primary key;我猜person_id可能已经是主键了; if so, no additional index is needed.如果是这样,则不需要额外的索引。

Then, you can replace the existing table -- but back it up first:然后,您可以替换现有表——但首先要备份它:

truncate table votings;

insert into votings ( . . . )    -- list columns here
    select . . .                 -- and the same columns here
    from temp_votings;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM