How can I speed up the PostgreSQL UPDATE FROM sql query below? It currently takes days to finish running.
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND (ip.status is null or ip.status != '6');
And why does it takes days to run in the first place?
Most of the time, I manually kill the query because it takes too long to run like more than 24 hours. Last time it successfully finished running, it took almost 38 hours.
import_parts
table has 971971
rows
parts.part_parts
table has 2196357
rows
parts.part_parts
table has an index on upc
and id
is the primary key of the table.
I already tried running VACUUM ANALYZE
on import_parts
table and parts.part_parts
table before the update query above runs but the query still takes too long to run, so I manually killed it after 30 minutes. I'm hoping to be able to run the query in under 30 minutes.
Here's the result of EXPLAIN when I run the query after running VACUUM ANALYZE
on import_parts
table and parts.part_parts
table:
UPDATE 1:
I also tried setting enable_nestloop
to off: SET enable_nestloop TO off
But the query still takes too long to run so I manually killed it. Here's the result of EXPLAIN
when enable_nestloop is turned off:
UPDATE 2:
Here's the result of EXPLAIN when using the query suggested by Abelisto on his answer to this post:
When I actually run the query though, I'm encountering this error:
ERROR: more than one row returned by a subquery used as an expression
I'm still figuring out how to fix the error.
First of all, try to rewrite your query like
UPDATE import_parts ip
SET part_part_id = (
SELECT pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc)
WHERE status is null or status != '6';
Obviously it raises something like to
ERROR: more than one row returned by a subquery used as an expression
Fix it using additionally conditions (subquery should to return exactly one or zero row for each row in the target table)
From what you say, it seems that upc
is not unique in parts_parts
. Try running this:
select upc, count(*)
from parts.parts_parts pp
group by upc
having count(*) > 1;
These duplicates are probably causing the performance problems. You could get around this by arbitrarily choosing a value, such as:
UPDATE import_parts ip
SET part_part_id = pp.id
FROM (SELECT pp.upc, MIN(pp.id) as id
FROM parts.part_parts pp
GROUP BY pp.upc
) pp
WHERE pp.upc = ip.upc AND (ip.status is null or ip.status <> '6');
Create an index with in import_parts with columns: upc,status.
I will recomend you to split in two sentences:
I do't know your status, but i suppose you have status: null, 1, 2, 3, 4, 5, 6, 7
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND ip.status is null
;
UPDATE import_parts ip
SET part_part_id = pp.id
FROM parts.part_parts pp
WHERE pp.upc = ip.upc
AND ip.status IN(1, 2, 3, 4, 5, 7)
;
Of course you need to change 1, 2, 3, 4, 5, 7 for your values(different from 6)
I also like the answer of @Gordon Linoff, but it depends of how many rows do you have by upc
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.