[英]Efficiently insert into table in Postgres
In my database, I save a tree-like datastructure as a table tab
with the columns id
(primary key), value
, from_id
and depth
, where depth
(integer) represents the distance from the root of the tree.在我的数据库中,我将一个树状数据结构保存为一个表
tab
其中包含id
(主键)、 value
、 from_id
和depth
,其中depth
(整数)表示到树根的距离。
Now I would like to add rows to the table tab
from another table candidates
(columns id, value, from_id
), but with two restrictions: 1) only new id
and 2) only rows where the depth
is below some given treshold (eg 3 or 4).现在我想从另一个表
candidates
(列id, value, from_id
)向表tab
添加行,但有两个限制:1)只有新id
和 2)只有depth
低于某个给定阈值的行(例如 3或 4)。
There may be more than one from_id
in tab
that point to a new row in candidates
.可能有不止一个
from_id
在tab
这一点在新行candidates
。
Being a Postgres beginner, I hope my approach is correct however very inefficient:作为 Postgres 初学者,我希望我的方法是正确的,但效率很低:
insert into tab
select distinct c.id, c.value, c.from_id, t.depth+1 as depth
from candidates as c
join tab as t on t.id=c.from_id
where depth<3 and c.id not in
(select id from tab);
I am looking for suggestions to speed this up.我正在寻找加快速度的建议。 Together with two other operations in one transaction, this takes several minutes for less than 10k rows.
与一个事务中的其他两个操作一起,对于少于 10k 行,这需要几分钟。
I am working from R
, using the RPostgres
package, however I believe this is more a SQL / database problem.我从
R
工作,使用RPostgres
包,但是我相信这更像是一个 SQL/数据库问题。
You can try if left joining tab
and checking for its id
s to be NULL
brings you a benefit.您可以尝试使用 left join
tab
并检查其id
是否为NULL
给您带来好处。
INSERT INTO tab
(id,
value,
from_id,
depth)
SELECT c1.id,
c1.value,
c1.from_id,
t1.depth + 1
FROM candidates c1
INNER JOIN tab t1
ON t1.id = c1.from_key
LEFT JOIN tab t2
ON t2.id = c1.id
WHERE t1.depth + 1 < 3
AND t2.id IS NULL;
Along with this try to put indexes on tab (id, depth)
and candidates (from_key)
.与此同时,尝试将索引放在
tab (id, depth)
和candidates (from_key)
。
Another option is a correlated subquery with NOT EXISTS
.另一种选择是带有
NOT EXISTS
的相关子查询。
INSERT INTO tab
(id,
value,
from_id,
depth)
SELECT c1.id,
c1.value,
c1.from_id,
t1.depth + 1
FROM candidates c1
INNER JOIN tab t1
ON t1.id = c1.from_key
WHERE t1.depth + 1 < 3
AND NOT EXISTS (SELECT *
FROM tab t2
WHERE t2.id = c1.id);
Either way you likely need to get rid of the IN
clause if tab
has a lot of rows to improve performance.无论哪种方式,如果
tab
有很多行以提高性能,您都可能需要去掉IN
子句。
And get accustomed to always explicitly write down the target columns in an INSERT
statement as the statement may otherwise break if you make changes to the target table, eg adding a column.并且习惯于始终在
INSERT
语句中明确写下目标列,否则如果您对目标表进行更改(例如添加列),语句可能会中断。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.