简体   繁体   中英

Postgresql - Update using join and text column - Improve performance?

Been strugling with this update, that I have to do using text columns instead of IDs (importing legacy data)

UPDATE contribs lc SET
       organization_id = (select o.organization_id from aux_orgs o WHERE UPPER(o.name)=UPPER(lc.committee_name));

I have 4.4 Million records on contribs , 4.2M match the subquery. aux_orgs has 7185 records;

Current indexes (contribs)

    "idx_individual_contributions_committee_name" btree (committee_name)
    "idx_individual_contributions_organization_id" btree (organization_id)

aux_orgs

    "idx_aux_orgs_names" btree (name)

Is there anyway to improve on it? I tried running it for more than 6 hours with no success.

I am using a DB instance on AWS, Postgres 10.4

-- Using EXPLAIN

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Update on contribs lc  (cost=0.00..1030617572.50 rows=5114850 width=253)
   ->  Seq Scan on contribs lc  (cost=0.00..1030617572.50 rows=5114850 width=253)
         SubPlan 1
           ->  Seq Scan on aux_orgs o  (cost=0.00..201.42 rows=36 width=16)
                 Filter: (upper(name) = upper(lc.committee_name))

Create an index on an expression:

CREATE INDEX ind_augxorgs_name ON aux_orgs(UPPER(name));

UPDATE contribs lc
    SET organization_id = (select o.organization_id from aux_orgs o WHERE UPPER(o.name) = UPPER(lc.committee_name));

That said, if you are updating all the rows in the table, then the update will be very slow. Sometimes it is faster to recreate the table rather than updating all the rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM