简体   繁体   English

PostgreSQL GIN 索引比 pg_trgm 的 GIST 慢吗?

[英]PostgreSQL GIN index slower than GIST for pg_trgm?

Despite what all the documentation says, I'm finding GIN indexes to be significantly slower than GIST indexes for pg_trgm related searches.尽管所有文档都说了些什么,但我发现 GIN 索引比 pg_trgm 相关搜索的 GIST 索引慢得多。 This is on a table of 25 million rows with a relatively short text field (average length of 21 characters).这是一个包含 2500 万行的表格,其中包含一个相对较短的文本字段(平均长度为 21 个字符)。 Most of the rows of text are addresses of the form "123 Main st, City".大多数文本行都是“123 Main st, City”形式的地址。

GIST index takes about 4 seconds with a search like GIST 索引大约需要 4 秒,搜索如下

select suggestion from search_suggestions where suggestion % 'seattle';

But GIN takes 90 seconds and the following result when running with EXPLAIN ANALYZE :但是当使用EXPLAIN ANALYZE运行时,GIN 需要 90 秒和以下结果:

Bitmap Heap Scan on search_suggestions  (cost=330.09..73514.15 rows=25043 width=22) (actual time=671.606..86318.553 rows=40482 loops=1)
  Recheck Cond: ((suggestion)::text % 'seattle'::text)
  Rows Removed by Index Recheck: 23214341
  Heap Blocks: exact=7625 lossy=223807
  ->  Bitmap Index Scan on tri_suggestions_idx  (cost=0.00..323.83 rows=25043 width=0) (actual time=669.841..669.841 rows=1358175 loops=1)
        Index Cond: ((suggestion)::text % 'seattle'::text)
Planning time: 1.420 ms
Execution time: 86327.246 ms

Note that over a million rows are being selected by the index, even though only 40k rows actually match.请注意,索引选择了超过一百万行,即使实际上只有 40k 行匹配。 Any ideas why this is performing so poorly?任何想法为什么这表现如此糟糕? This is on PostgreSQL 9.4.这是在 PostgreSQL 9.4 上。

Some issues stand out:一些问题很突出:

First, consider upgrading to a current version of Postgres .首先,考虑升级到当前版本的 Postgres At the time of writing that's pg 9.6 or pg 10 (currently beta).在撰写本文时,它是第 9.6 页或第 10 页(目前是测试版)。 Since Pg 9.4 there have been multiple improvements for GIN indexes, the additional module pg_trgm and big data in general.自 Pg 9.4 以来,对 GIN 索引、附加模块 pg_trgm 和大数据进行了多项改进。

Next, you need much more RAM , in particular a higher work_mem setting.接下来,您需要更多RAM ,尤其是更高的work_mem设置。 I can tell from this line in the EXPLAIN output:我可以从EXPLAIN输出中的这一行看出:

Heap Blocks: exact=7625 lossy=223807

"lossy" in the details for a Bitmap Heap Scan (with your particular numbers) indicates a dramatic shortage of work_mem .位图堆扫描(使用您的特定数字)的详细信息中的“有损”表明work_mem严重短缺。 Postgres only collects block addresses in the bitmap index scan instead of row pointers because that's expected to be faster with your low work_mem setting (can't hold exact addresses in RAM). Postgres 只收集位图索引扫描中的块地址,而不是行指针,因为使用较低的work_mem设置(不能在 RAM 中保存确切地址)预计会更快。 Many more non-qualifying rows have to be filtered in the following Bitmap Heap Scan this way.在下面的位图堆扫描中,必须以这种方式过滤更多不合格的行。 This related answer has details:这个相关的答案有详细信息:

But don't set work_mem too high without considering the whole situation:但是不要在不考虑整个情况的情况下将work_mem设置work_mem太高

There may other problems, like index or table bloat or more configuration bottlenecks.可能还有其他问题,例如索引或表膨胀或更多配置瓶颈。 But if you fix just these two items, the query should be much faster already.但是,如果你只是解决这两个项目,查询很多了。

Also, do you really need to retrieve all 40k rows in the example?另外,您真的需要检索示例中的所有 40k 行吗? You probably want to add a small LIMIT to the query and make it a "nearest-neighbor" search - in which case a GiST index is the better choice after all, because that is supposed to be faster with a GiST index.你可能要一个小添加LIMIT的查询,使之成为“近邻”搜索-在这种情况下,其主旨在于指数毕竟是更好的选择,因为应该是与要旨的索引快。 Example:例子:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM