[英]pg_dump and pg_restore on giant databases
I have currently a task to improve a database-structure. 我目前有一个改善数据库结构的任务。 For this we want to effectively dump and restore one single giant database.
为此,我们希望有效地转储和还原单个巨型数据库。 ( approx. 1TB and growing )
( 大约1TB并在增长 )
To test things with this database, we wanted to transfer this database to another server-node, and this via pg_dump
and pg_restore
. 为了测试该数据库的性能,我们希望将此数据库通过
pg_dump
和pg_restore
转移到另一个服务器节点。
We are running a v10 ( https://www.postgresql.org/docs/10/app-pgdump.html ) Server, so we are limited to their possible parameters. 我们正在运行v10( https://www.postgresql.org/docs/10/app-pgdump.html )服务器,因此我们仅限于其可能的参数。 It is also required to dump the full database, and not only parts.
还需要转储整个数据库,而不仅仅是部分数据库。
For this I tried a couple of approaches, these sources helped a lot: 为此,我尝试了几种方法,这些资源提供了很多帮助:
and foremost: 最重要的是:
The problem is, that you can almost only improve one of these task, but not both simultaneously. 问题在于,您几乎只能改进其中一项任务,而不能同时改进两项任务。
Dumping in directory format is extremely fast ( ~1 hour ), but restoring is not. 以目录格式进行转储的速度非常快( 〜1小时 ),但恢复速度却不是。
pg_dump --blobs --dbname="$DBNAME" --file=$DUMPDIR --format=directory --host=$SERVERHOSTNAME --jobs=$THREADS --port=$SERVERPORT--username="$SERVERUSERNAME"
pg_restore --clean --create --format=directory --jobs=$THREADS --host=$SERVERHOSTNAME --port=$SERVERPORT --username="$SERVERUSERNAME" "./"
Problem about this restore-method is, even though I assigned multiple cores to it, it only uses one, with barely 4% CPU used on the server-core. 关于此还原方法的问题是,即使我为其分配了多个内核,它也仅使用一个,而在服务器内核上仅使用了4%的CPU。
Dumping in custom format is extremely slow, that the server even couldn't complete it overnight (Session timeout). 以自定义格式进行转储的过程非常缓慢,服务器甚至无法在一夜之间完成它(会话超时)。
pg_dump --blobs --compress=9 --dbname="$dbname" --file="$DUMPDIR/db.dump" --format=custom --host=$SERVERHOSTNAME --port=$SERVERPORT --username=$SERVERUSERNAME
So I had different approaches in mind: 所以我想到了不同的方法:
Piping seems to be an ineffective way of dumping according to the author stated above. 根据上面的作者所述,管道似乎是一种无效的转储方式。
Does anyone have more experience in this? 有人在这方面有更多经验吗? And are my approach-ideas useful, or do you have a complete different solution in mind?
我的方法思想有用吗,还是您有一个完全不同的解决方案?
Oh, before I forget: We are currently limited to 5TB on our external server, and the internal server which runs the db should not get bloated with data-fragments, even temporarily. 哦,在我忘记之前:我们目前在外部服务器上的内存限制为5TB,运行数据库的内部服务器不应因数据碎片而data肿,即使是暂时的。
A parallel pg_restore
with the directory format should speed up processing. 具有目录格式的并行
pg_restore
应该加快处理速度。
If it doesn't, I suspect that much of the data is in one large table, which pg_restore
(and pg_dump
) cannot parallelize. 如果没有,我怀疑很多数据都在一个大表中,而
pg_restore
(和pg_dump
)无法并行化。
Make sure you disable compression ( -z 0
) to improve the speed (unless you have a weak network). 确保禁用压缩(
-z 0
)以提高速度(除非网络较弱)。
You might be considerably faster with an online file system backup: 使用在线文件系统备份可能会更快:
pg_basebackup
is simple, but cannot be parallelized. pg_basebackup
很简单,但是不能并行化。
Using the low-level API , you can parallelize the backup with operating system or storage techniques. 使用低级API ,您可以将备份与操作系统或存储技术并行化。
The disadvantage is that with a file system backup, you can only copy the whole database cluster. 缺点是使用文件系统备份时,您只能复制整个数据库集群。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.