简体   繁体   中英

PostgreSQL Update Records in Batches

I need to update one column in a table with huge set of data in the PostgreSQL DB.

Since the job might run for 1 or 2 days continuously due to the large set of data, I need to do this batch wise and commit the transactions batch wise so that I can keep track of the progress and print logs of any batches that fails and run them manually later by providing the failed offset and limit.

One method I tried to do this is the following in the postgres block which failed since I can't use row_number() in where clause.

DO LANGUAGE plpgsql $$
DECLARE

    row_count_  integer;

    offset_     integer := 0;
    batch_size_ integer := 100000;
    limit_      integer := offset_ + batch_size_;

    total_rows_ integer;

BEGIN

    WHILE offset_ < total_rows_ LOOP
        limit_ := offset_ + batch_size_;
    
        UPDATE table1 
            SET column1 = 'Value' 
            WHERE row_number() over() >= offset_ AND row_number() over() < limit_;
        GET DIAGNOSTICS row_count_ = row_count;
        RAISE INFO '% rows updated from % to %', row_count_, offset_, limit_;
    
        offset_ := offset_ + batch_size_;
    END LOOP;

EXCEPTION WHEN OTHERS THEN 

    RAISE NOTICE 'Transaction is rolling back, % : %', SQLSTATE, SQLERRM;
    ROLLBACK;
   
END $$;

I'm even ok to do this using a python script but I need to do this the fastest way possible. I went through many articles which uses a select sub query which is too expensive due to the join in my opinion.

Could someone please help me with a better way to achieve this?

If the activity lasts for several days, doing the UPDATE in batches makes sense. You may want to run an explicit VACUUM on the table between batches to avoid table bloat.

About your core problem, I would say that the simplest solution would be to batch by primary key values, that is, run statements like:

UPDATE tab
SET col = newval
WHERE id <= 100000
  AND /* additional criteria*/;

VACUUM tab;

UPDATE tab
SET col = newval
WHERE id > 100000 AND id <= 200000
  AND /* additional criteria*/;

...

Keep repeating that until you reach the maximum id .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM