In my database, I save a tree-like datastructure as a table tab
with the columns id
(primary key), value
, from_id
and depth
, where depth
(integer) represents the distance from the root of the tree.
Now I would like to add rows to the table tab
from another table candidates
(columns id, value, from_id
), but with two restrictions: 1) only new id
and 2) only rows where the depth
is below some given treshold (eg 3 or 4).
There may be more than one from_id
in tab
that point to a new row in candidates
.
Being a Postgres beginner, I hope my approach is correct however very inefficient:
insert into tab
select distinct c.id, c.value, c.from_id, t.depth+1 as depth
from candidates as c
join tab as t on t.id=c.from_id
where depth<3 and c.id not in
(select id from tab);
I am looking for suggestions to speed this up. Together with two other operations in one transaction, this takes several minutes for less than 10k rows.
I am working from R
, using the RPostgres
package, however I believe this is more a SQL / database problem.
You can try if left joining tab
and checking for its id
s to be NULL
brings you a benefit.
INSERT INTO tab
(id,
value,
from_id,
depth)
SELECT c1.id,
c1.value,
c1.from_id,
t1.depth + 1
FROM candidates c1
INNER JOIN tab t1
ON t1.id = c1.from_key
LEFT JOIN tab t2
ON t2.id = c1.id
WHERE t1.depth + 1 < 3
AND t2.id IS NULL;
Along with this try to put indexes on tab (id, depth)
and candidates (from_key)
.
Another option is a correlated subquery with NOT EXISTS
.
INSERT INTO tab
(id,
value,
from_id,
depth)
SELECT c1.id,
c1.value,
c1.from_id,
t1.depth + 1
FROM candidates c1
INNER JOIN tab t1
ON t1.id = c1.from_key
WHERE t1.depth + 1 < 3
AND NOT EXISTS (SELECT *
FROM tab t2
WHERE t2.id = c1.id);
Either way you likely need to get rid of the IN
clause if tab
has a lot of rows to improve performance.
And get accustomed to always explicitly write down the target columns in an INSERT
statement as the statement may otherwise break if you make changes to the target table, eg adding a column.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.