简体   繁体   English

为什么Postgres扫描一个巨大的表而不是使用我的索引?

[英]Why is Postgres scanning a huge table instead of using my index?

I noticed one of my SQL queries is much slower than I expected it to be, and it turns out that the query planner is coming up with a plan that seems really bad to me. 我注意到我的一个SQL查询比我预期的要慢得多,而事实证明查询计划程序正在制定一个对我来说似乎非常糟糕的计划。 My query looks like this: 我的查询如下所示:

select A.style, count(B.x is null) as missing, count(*) as total
  from A left join B using (id, type)
  where A.country_code in ('US', 'DE', 'ES')
  group by A.country_code, A.style
  order by A.country_code, total

B has a (type, id) index, and A has a (country_code, style) index. B具有(type,id)索引,A具有(country_code,style)索引。 A is much smaller than B: 250K rows in A vs 100M in B. A远小于B:A中的250K行与B中的100M行。

So, I expected the query plan to look something like: 所以,我希望查询计划看起来像:

  • Use the index on A to select just those rows with appropriate country_code 使用A上的索引仅选择具有相应country_code
  • Left join with B, to find the matching row (if any) based on its (type, id) index 左连接B,根据其(type, id)索引查找匹配的行(如果有)
  • Group things according to country_code and style 根据country_codestyle分组
  • Add up the counts 加起来计数

But the query planner decides the best way to do this is a sequential scan on B, and then a right join against A. I can't fathom why that is; 但查询规划器决定执行此操作的最佳方法是对B进行顺序扫描,然后对A进行右连接。我无法理解为什么会这样做; does anyone have an idea? 有没有人有想法? Here's the actual query plan it generated: 这是它生成的实际查询计划:

 Sort  (cost=14283513.27..14283513.70 rows=171 width=595)
   Sort Key: a.country_code, (count(*))
   ->  HashAggregate  (cost=14283505.22..14283506.93 rows=171 width=595)
         ->  Hash Right Join  (cost=8973.71..14282810.03 rows=55615 width=595)
               Hash Cond: ((b.type = a.type) AND (b.id = a.id))
               ->  Seq Scan on b (cost=0.00..9076222.44 rows=129937844 width=579)
               ->  Hash  (cost=8139.49..8139.49 rows=55615 width=28)
                     ->  Bitmap Heap Scan on a  (cost=1798.67..8139.49 rows=55615 width=28)
                           Recheck Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))
                           ->  Bitmap Index Scan on a_country_code_type_idx  (cost=0.00..1784.76 rows=55615 width=0)
                                 Index Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))

Edit: following a clue from the comments on another question, I tried it with SET ENABLE_SEQSCAN TO OFF; 编辑:根据另一个问题的评论提示,我尝试使用SET ENABLE_SEQSCAN TO OFF; , and the query runs ten times as fast. ,查询运行速度快十倍。 Obviously I don't want to permanently disable sequential scans, but this helps confirm my otherwise-baseless guess that the sequential scan is not the best plan available. 显然,我不想永久禁用顺序扫描,但这有助于确认我没有根据的猜测顺序扫描不是最好的可用计划。

If the query is actually faster with an index scan as your added test proves, then it's typically one or both of these: 如果通过索引扫描查询实际上更快,因为您添加的测试证明了,那么它通常是以下一个或两个:

  • Your statistics are off or not precise enough to cover irregular data distribution. 您的统计信息已关闭或不够精确,无法涵盖不规则的数据分发。
  • Your cost settings are off, which Postgres uses to base its cost estimation on. 您的费用设置已关闭,Postgres将其用于成本估算。

Details for both in this closely related answer: 在这个密切相关的答案中的两个细节:

Probably you db has right. 可能你的db有权利。 It looks there are 55k matching rows for the first filter. 看起来第一个过滤器有55k匹配的行。 Running this amount of index scan iterations can be extremely time consuming. 运行此数量的索引扫描迭代可能非常耗时。 Usually hash joins are faster for not so selective things. 通常,散列连接对于不那么有选择性的东西来说更快。

Anyway you can try a few things: 无论如何你可以尝试一些事情:

  • remove the left keyword and use inner join. 删除左侧关键字并使用内部联接。
  • analyze your tables. 分析你的表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM