[英]How to improve PostgreSQL performance on INSERT?
I have written a Node.js application that writes lots of records to a PostgreSQL 9.6 database. 我已经编写了一个Node.js应用程序,该应用程序将许多记录写入PostgreSQL 9.6数据库。 Unfortunately, it feels quite slow. 不幸的是,感觉很慢。 To be able to test things I have created a short but complete program that reproduces the scenario: 为了能够进行测试,我创建了一个简短但完整的程序来重现场景:
'use strict';
const async = require('async'),
pg = require('pg'),
uuid = require('uuidv4');
const pool = new pg.Pool({
protocol: 'pg',
user: 'golo',
host: 'localhost',
port: 5432,
database: 'golo'
});
const records = [];
for (let i = 0; i < 10000; i++) {
records.push({ id: uuid(), revision: i, data: { foo: 'bar', bar: 'baz' }, flag: true });
}
pool.connect((err, database, close) => {
if (err) {
/* eslint-disable no-console */
return console.log(err);
/* eslint-enable no-console */
}
database.query(`
CREATE TABLE IF NOT EXISTS "foo" (
"position" bigserial NOT NULL,
"id" uuid NOT NULL,
"revision" integer NOT NULL,
"data" jsonb NOT NULL,
"flag" boolean NOT NULL,
CONSTRAINT "foo_pk" PRIMARY KEY("position"),
CONSTRAINT "foo_index_id_revision" UNIQUE ("id", "revision")
);
`, errQuery => {
if (errQuery) {
/* eslint-disable no-console */
return console.log(errQuery);
/* eslint-enable no-console */
}
async.series({
beginTransaction (done) {
/* eslint-disable no-console */
console.time('foo');
/* eslint-enable no-console */
database.query('BEGIN', done);
},
saveRecords (done) {
async.eachSeries(records, (record, doneEach) => {
database.query({
name: 'save',
text: `
INSERT INTO "foo"
("id", "revision", "data", "flag")
VALUES
($1, $2, $3, $4) RETURNING position;
`,
values: [ record.id, record.revision, record.data, record.flag ]
}, (errQuery2, result) => {
if (errQuery2) {
return doneEach(errQuery2);
}
record.position = Number(result.rows[0].position);
doneEach(null);
});
}, done);
},
commitTransaction (done) {
database.query('COMMIT', done);
}
}, errSeries => {
/* eslint-disable no-console */
console.timeEnd('foo');
/* eslint-enable no-console */
if (errSeries) {
return database.query('ROLLBACK', errRollback => {
close();
if (errRollback) {
/* eslint-disable no-console */
return console.log(errRollback);
/* eslint-enable no-console */
}
/* eslint-disable no-console */
console.log(errSeries);
/* eslint-enable no-console */
});
}
close();
/* eslint-disable no-console */
console.log('Done!');
/* eslint-enable no-console */
});
});
});
The performance I get for inserting 10.000 rows is 2.5 seconds. 我插入10.000行获得的性能为2.5秒。 This is not bad, but also not great. 这还不错,但也不是很好。 What can I do to improve speed? 我该如何提高速度?
Some thoughts that I had so far: 到目前为止,我有一些想法:
INSERT
command. 使用单个INSERT
命令一次插入多行。 Unfortunately, this is not possible, as in reality, the number of records that need to be written varies from call to call and a varying number of arguments makes it impossible to use prepared statements. 不幸的是,这是不可能的,因为实际上,需要编写的记录数量因调用而异,并且参数数量的变化使得无法使用准备好的语句。 COPY
instead of INSERT
: I can't use this, since this happens at runtime, not at initialization time. 使用COPY
而不是INSERT
:我不能使用它,因为这是在运行时发生的,而不是在初始化时发生的。 text
instead of jsonb
: Didn't change a thing. 使用text
而不是jsonb
:没有改变。 json
instead of jsonb
: Didn't change a thing either. 使用json
代替jsonb
:也没有改变任何东西。 A few more notes on the data that happens in reality: 关于现实中发生的数据的更多说明:
revision
is not necessarily increasing. revision
不一定会增加。 This is just a number. 这只是一个数字。 flag
is not always true
, it can be true
and false
as well. flag
并不总是true
,也可以是true
和false
。 data
field contains different data, too. 当然, data
字段也包含不同的数据。 So in the end it comes down to: 因此,最终归结为:
INSERT
? 有什么可能显着地加快对INSERT
多个单个调用的速度? Insert multiple rows at once using a single INSERT command. 使用单个INSERT命令一次插入多行。 Unfortunately, this is not possible, as in reality, the number of records that need to be written varies from call to call and a varying number of arguments makes it impossible to use prepared statements. 不幸的是,这是不可能的,因为实际上,需要编写的记录数量因调用而异,并且参数数量的变化使得无法使用准备好的语句。
This is the right answer, followed by an invalid counter-argument. 这是正确的答案,后面是无效的反论点。
You can generate your multi-row inserts in a loop, with some 1000 - 10,000 records per query, depending on the size of the records. 您可以循环生成多行插入,每个记录大约有1000-10,000条记录,具体取决于记录的大小。
And you do not need prepared statements for this at all. 并且您根本不需要准备好的语句。
See this article I wrote about the same issues: Performance Boost . 请参阅我写的有关相同问题的文章: Performance Boost 。
Following the article, my code was able to insert 10,000 records in under 50ms . 紧随本文之后,我的代码能够在50ms内插入10,000条记录。
A related question: Multi-row insert with pg-promise . 一个相关的问题: 带有pg-promise的多行插入 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.