简体   繁体   English

在大表上创建索引 - postgresql 9.6

[英]Creating indexes on big tables - postgresql 9.6

I'm trying to create some regular indexes on a big table (26G), but it takes a lot of time - more than 2 hours.我正在尝试在一个大表 (26G) 上创建一些常规索引,但这需要很多时间 - 超过 2 小时。 Every index is taking about 11 minutes.每个索引大约需要 11 分钟。

Maybe I'm wrong and I should concentrate on improving the time it takes me to load the data into postgres from oracle ( oracle_fdw ).也许我错了,我应该专注于改进将数据从 oracle ( oracle_fdw ) 加载到 postgres 所需的时间。 I preform a lot of inserts into local_postgresql_table select * from remote_oracle_table (about 200G), which also takes a lot of time.我在local_postgresql_table select * from remote_oracle_table (大约200G)中执行了很多插入,这也需要很多时间。

If there is a way to change one of the parameters to improve the performance, I would be happy to hear how.如果有办法更改其中一个参数以提高性能,我会很高兴听到。 Running this query on 26G takes two hours.在 26G 上运行此查询需要两个小时。

Is there a way to improve this operation?有没有办法改进这个操作? Is there a way to improve this operation by improving the hardware (I didn't see that the server is overloaded)?有没有办法通过改进硬件来改进这个操作(我没有看到服务器过载)?

The parameters that I configured:我配置的参数:

min_parallel_relation_size = 200MB
max_parallel_workers_per_gather = 5 
max_worker_processes = 8 
effective_cache_size = 2500MB
work_mem = 16MB
maintenance_work_mem = 1500MB
shared_buffers = 1500MB
RAM : 5G

Visit this blog for the Example of Parallel Query Processing: 访问此博客以获取并行查询处理示例:

For Parallel Sequential Scanning, in background multiple workers or CPU threads are responsible for executing one single query.对于并行顺序扫描,在后台多个工作线程或 CPU 线程负责执行单个查询。 We can easily set Parallel Sequential parameter's value can execute your query 10 times faster.我们可以轻松设置 Parallel Sequential 参数的值,可以将您的查询执行速度提高 10 倍。

Using max_worker_processes parameter, in PostgreSQL 9.6, You can change the Process Workers parameter value which is default 8.使用 max_worker_processes 参数,在 PostgreSQL 9.6 中,您可以更改 Process Workers 参数值,默认值为 8。

One issue with creating X multiple indexes is that if the table size exceeds your cache size then you cannot avoid performing X physical reads of your table.创建 X 多个索引的一个问题是,如果表大小超过缓存大小,则无法避免对表执行 X 次物理读取。

Many years ago, I got round this on Oracle by starting the builds of multiple indexes in different sessions at the same time.许多年前,我在 Oracle 上通过在不同会话中同时开始构建多个索引来解决这个问题。 This meant that there was only one physical read of each block for each batch of indexes being created.这意味着对于正在创建的每批索引,每个块只有一次物理读取。

The downside is that you need more sort memory to be able to effectively do this.缺点是您需要更多的排序内存才能有效地执行此操作。

Might be worth a try.可能值得一试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM