Efficiently insert into table in Postgres

Question

In my database, I save a tree-like datastructure as a table tab with the columns id (primary key), value , from_id and depth , where depth (integer) represents the distance from the root of the tree.

Now I would like to add rows to the table tab from another table candidates (columns id, value, from_id ), but with two restrictions: 1) only new id and 2) only rows where the depth is below some given treshold (eg 3 or 4).

There may be more than one from_id in tab that point to a new row in candidates .

Being a Postgres beginner, I hope my approach is correct however very inefficient:

insert into tab
select distinct c.id, c.value, c.from_id, t.depth+1 as depth
from candidates as c
join tab as t on t.id=c.from_id
where depth<3 and c.id not in
(select id from tab);

I am looking for suggestions to speed this up. Together with two other operations in one transaction, this takes several minutes for less than 10k rows.

I am working from R , using the RPostgres package, however I believe this is more a SQL / database problem.

Answer 1

You can try if left joining tab and checking for its id s to be NULL brings you a benefit.

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
            LEFT JOIN tab t2
                      ON t2.id = c1.id
       WHERE t1.depth + 1 < 3
             AND t2.id IS NULL;

Along with this try to put indexes on tab (id, depth) and candidates (from_key) .

Another option is a correlated subquery with NOT EXISTS .

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
       WHERE t1.depth + 1 < 3
             AND NOT EXISTS (SELECT *
                                    FROM tab t2
                                    WHERE t2.id = c1.id);

Either way you likely need to get rid of the IN clause if tab has a lot of rows to improve performance.

And get accustomed to always explicitly write down the target columns in an INSERT statement as the statement may otherwise break if you make changes to the target table, eg adding a column.

Efficiently insert into table in Postgres

Question

1 answers

solution1
1 ACCPTED 2019-12-01 23:34:02

Efficiently insert into table in Postgres

Question

1 answers

solution1 1 ACCPTED 2019-12-01 23:34:02

solution1
1 ACCPTED 2019-12-01 23:34:02