I need to update one column in a table with huge set of data in the PostgreSQL DB.
Since the job might run for 1 or 2 days continuously due to the large set of data, I need to do this batch wise and commit the transactions batch wise so that I can keep track of the progress and print logs of any batches that fails and run them manually later by providing the failed offset and limit.
One method I tried to do this is the following in the postgres block which failed since I can't use row_number() in where clause.
DO LANGUAGE plpgsql $$
DECLARE
row_count_ integer;
offset_ integer := 0;
batch_size_ integer := 100000;
limit_ integer := offset_ + batch_size_;
total_rows_ integer;
BEGIN
WHILE offset_ < total_rows_ LOOP
limit_ := offset_ + batch_size_;
UPDATE table1
SET column1 = 'Value'
WHERE row_number() over() >= offset_ AND row_number() over() < limit_;
GET DIAGNOSTICS row_count_ = row_count;
RAISE INFO '% rows updated from % to %', row_count_, offset_, limit_;
offset_ := offset_ + batch_size_;
END LOOP;
EXCEPTION WHEN OTHERS THEN
RAISE NOTICE 'Transaction is rolling back, % : %', SQLSTATE, SQLERRM;
ROLLBACK;
END $$;
I'm even ok to do this using a python script but I need to do this the fastest way possible. I went through many articles which uses a select sub query which is too expensive due to the join in my opinion.
Could someone please help me with a better way to achieve this?
If the activity lasts for several days, doing the UPDATE
in batches makes sense. You may want to run an explicit VACUUM
on the table between batches to avoid table bloat.
About your core problem, I would say that the simplest solution would be to batch by primary key values, that is, run statements like:
UPDATE tab
SET col = newval
WHERE id <= 100000
AND /* additional criteria*/;
VACUUM tab;
UPDATE tab
SET col = newval
WHERE id > 100000 AND id <= 200000
AND /* additional criteria*/;
...
Keep repeating that until you reach the maximum id
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.