简体   繁体   English

使用索引或位图索引扫描在时间戳上进行高效的PostgreSQL查询?

[英]Efficient PostgreSQL query on timestamp using index or bitmap index scan?

In PostgreSQL, I have an index on a date field on my tickets table. 在PostgreSQL中,我在tickets表的日期字段上有一个索引。 When I compare the field against now() , the query is pretty efficient: 当我将字段与now() ,查询效率很高:

# explain analyze select count(1) as count from tickets where updated_at > now();
                                                             QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=90.64..90.66 rows=1 width=0) (actual time=33.238..33.238 rows=1 loops=1)
   ->  Index Scan using tickets_updated_at_idx on tickets  (cost=0.01..90.27 rows=74 width=0) (actual time=0.016..29.318 rows=40250 loops=1)
         Index Cond: (updated_at > now())
Total runtime: 33.271 ms

It goes downhill and uses a Bitmap Heap Scan if I try to compare it against now() minus an interval. 如果我尝试将它与now()减去一个间隔进行比较,它就会下坡并使用位图堆扫描。

# explain analyze select count(1) as count from tickets where updated_at > (now() - '24 hours'::interval);
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate  (cost=180450.15..180450.17 rows=1 width=0) (actual time=543.898..543.898 rows=1 loops=1)
->  Bitmap Heap Scan on tickets  (cost=21296.43..175963.31 rows=897368 width=0) (actual time=251.700..457.916 rows=924373 loops=1)
     Recheck Cond: (updated_at > (now() - '24:00:00'::interval))
     ->  Bitmap Index Scan on tickets_updated_at_idx  (cost=0.00..20847.74 rows=897368 width=0)     (actual time=238.799..238.799 rows=924699 loops=1)
           Index Cond: (updated_at > (now() - '24:00:00'::interval))
Total runtime: 543.952 ms

Is there a more efficient way to query using date arithmetic? 有没有更有效的方法来使用日期算术查询?

The 1st query expects to find rows=74 , but actually finds rows=40250 . 第一个查询希望找到的rows=74 ,但实际上找到的rows=40250
The 2nd query expects to find rows=897368 and actually finds rows=924699 . 第二个查询期望找到rows=897368并实际找到rows=924699

Of course, processing 23 x as many rows takes considerably more time. 当然,处理23倍的行会花费更多的时间。 So your actual times are not surprising. 因此,您的实际时间不足为奇。

Statistics for data with updated_at > now() are outdated. 具有updated_at > now()数据的统计信息已过时。 Run: 跑:

ANALYZE tickets;

and repeat your queries. 并重复您的查询。 And you seriously have data with updated_at > now() ? 并且您认真地拥有了updated_at > now() That sounds wrong. 听起来不对。

It's not surprising, however, that statistics are outdated for data most recently changed. 但是,对于最近更改的数据而言,统计数据已过时也就不足为奇了。 That's in the logic of things. 这符合事物的逻辑。 If your query depends on current statistics, you have to run ANALYZE before you run your query. 如果查询依赖于当前统计信息,则必须在运行查询之前运行ANALYZE

Also test with (in your session only): 还要进行测试(仅在您的会话中):

SET enable_bitmapscan = off;

and repeat your second query to see times without bitmap index scan. 并重复第二次查询以查看没有位图索引扫描的时间。

Why bitmap index scan for more rows? 为什么位图索引扫描更多行?

A plain index scan fetches rows from the heap sequentially as found in the index. 普通索引扫描按索引中的顺序从堆中获取行。 That's simple, dumb and without overhead. 这很简单,愚蠢而且没有开销。 Fast for few rows, but may end up more expensive than a bitmap index scan with a growing number of rows. 几行速度很快,但与行数不断增加的位图索引扫描相比,最终成本可能更高。

A bitmap index scan collects rows from the index before looking up the table. 位图索引扫描在查找表之前会从索引中收集行。 If multiple rows reside on the same data page, that saves repeated visits and can make things considerably faster. 如果同一数据页上有多个行,则可以节省重复访问的时间,并且可以使处理速度大大提高。 The more rows, the greater the chance, a bitmap index scan will save time. 行越多,机会越大,位图索引扫描将节省时间。

For even more rows (around 5% of the table, heavily depends on actual data), the planner switches to a sequential scan of the table and doesn't use the index at all. 对于更多的行(约占表的5%,很大程度上取决于实际数据),计划器将切换到表的顺序扫描 ,并且根本不使用索引。

The optimum would be an index only scan , introduced with Postgres 9.2. 最佳选择是Postgres 9.2中引入的仅索引扫描 That's only possible if some preconditions are met. 只有满足一些前提条件才有可能。 If all relevant columns are included in the index, the index type support it and the visibility map indicates that all rows on a data page are visible to all transactions, that page doesn't have to be fetched from the heap (the table) and the information in the index is enough. 如果所有相关列都包含在索引中,则索引类型支持该索引,并且可见性映射表表明数据页上的所有行对所有事务可见,而不必从堆(表)中获取该页,并且索引中的信息就足够了。

The decision depends on your statistics (how many rows Postgres expects to find and their distribution) and on cost settings , most importantly random_page_cost , cpu_index_tuple_cost and effective_cache_size . 这一决定取决于你的统计数据(Postgres的多少行希望能够找到和他们的分布)和费用设置 ,最重要的random_page_costcpu_index_tuple_costeffective_cache_size

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM