简体   繁体   English

有效地插入到 Postgres 中的表中

[英]Efficiently insert into table in Postgres

In my database, I save a tree-like datastructure as a table tab with the columns id (primary key), value , from_id and depth , where depth (integer) represents the distance from the root of the tree.在我的数据库中,我将一个树状数据结构保存为一个表tab其中包含id (主键)、 valuefrom_iddepth ,其中depth (整数)表示到树根的距离。

Now I would like to add rows to the table tab from another table candidates (columns id, value, from_id ), but with two restrictions: 1) only new id and 2) only rows where the depth is below some given treshold (eg 3 or 4).现在我想从另一个表candidates (列id, value, from_id )向表tab添加行,但有两个限制:1)只有新id和 2)只有depth低于某个给定阈值的行(例如 3或 4)。

There may be more than one from_id in tab that point to a new row in candidates .可能有不止一个from_idtab这一点在新行candidates

Being a Postgres beginner, I hope my approach is correct however very inefficient:作为 Postgres 初学者,我希望我的方法是正确的,但效率很低:

insert into tab
select distinct c.id, c.value, c.from_id, t.depth+1 as depth
from candidates as c
join tab as t on t.id=c.from_id
where depth<3 and c.id not in
(select id from tab);

I am looking for suggestions to speed this up.我正在寻找加快速度的建议。 Together with two other operations in one transaction, this takes several minutes for less than 10k rows.与一个事务中的其他两个操作一起,对于少于 10k 行,这需要几分钟。

I am working from R , using the RPostgres package, however I believe this is more a SQL / database problem.我从R工作,使用RPostgres包,但是我相信这更像是一个 SQL/数据库问题。

You can try if left joining tab and checking for its id s to be NULL brings you a benefit.您可以尝试使用 left join tab并检查其id是否为NULL给您带来好处。

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
            LEFT JOIN tab t2
                      ON t2.id = c1.id
       WHERE t1.depth + 1 < 3
             AND t2.id IS NULL;

Along with this try to put indexes on tab (id, depth) and candidates (from_key) .与此同时,尝试将索引放在tab (id, depth)candidates (from_key)

Another option is a correlated subquery with NOT EXISTS .另一种选择是带有NOT EXISTS的相关子查询。

INSERT INTO tab
            (id,
             value,
             from_id,
             depth)
SELECT c1.id,
       c1.value,
       c1.from_id,
       t1.depth + 1
       FROM candidates c1
            INNER JOIN tab t1
                       ON t1.id = c1.from_key
       WHERE t1.depth + 1 < 3
             AND NOT EXISTS (SELECT *
                                    FROM tab t2
                                    WHERE t2.id = c1.id);

Either way you likely need to get rid of the IN clause if tab has a lot of rows to improve performance.无论哪种方式,如果tab有很多行以提高性能,您都可能需要去掉IN子句。

And get accustomed to always explicitly write down the target columns in an INSERT statement as the statement may otherwise break if you make changes to the target table, eg adding a column.并且习惯于始终在INSERT语句中明确写下目标列,否则如果您对目标表进行更改(例如添加列),语句可能会中断。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM