简体   繁体   English

Postgres查询需要比预期更长的时间,甚至是字段上的索引

[英]Postgres query takes longer than expected, even w/ index on fields

I'm working on optimizing a Postgres table that stores information from a log file. 我正在优化Postgres表,该表存储来自日志文件的信息。

Here is the query: 这是查询:

SELECT c_ip as ip
     , x_ctx as file_name
     , date_time
     , live
     , c_user_agent as user_agent 
FROM events 
WHERE x_event = 'play' 
  AND date = '2012-12-01' 
  AND username = 'testing'

There are b-tree indexes on x_event, date, and username. x_event,date和username上有b-tree索引。 In this table, there are around 25 million rows. 在此表中,有大约2500万行。 Right now the query takes about 20-25 (correction, more like 40) seconds, and returns 143,000 rows. 现在,查询大约需要20-25(更正,更像是40)秒,并返回143,000行。

Is that time expected? 是时候预料到了吗? I would have thought it would be faster because of the indexes. 我原以为它因索引会更快。 Perhaps because of the sheer amount of data it has to go thru? 也许是因为它必须通过大量的数据?

EDIT: Here is the EXPLAIN ANALYZE: 编辑:这是EXPLAIN ANALYZE:

Bitmap Heap Scan on events  (cost=251347.32..373829.74 rows=35190 width=56) (actual time=5768.409..6124.313 rows=143061 loops=1)
  Recheck Cond: ((date = '2012-12-01'::date) AND (username = 'testing'::text) AND (x_event = 'play'::text))
  ->  BitmapAnd  (cost=251347.32..251347.32 rows=35190 width=0) (actual time=5762.083..5762.083 rows=0 loops=1)
        ->  Bitmap Index Scan on index_events_fresh_date  (cost=0.00..10247.04 rows=554137 width=0) (actual time=57.568..57.568 rows=572221 loops=1)
              Index Cond: (date = '2012-12-01'::date)
        ->  Bitmap Index Scan on index_events_fresh_username  (cost=0.00..116960.55 rows=6328206 width=0) (actual time=3184.053..3184.053 rows=6245831 loops=1)
              Index Cond: (username = 'testing'::text)
        ->  Bitmap Index Scan on index_events_fresh_x_event  (cost=0.00..124112.84 rows=6328206 width=0) (actual time=2478.919..2478.919 rows=6245841 loops=1)
              Index Cond: (x_event = 'play'::text)
Total runtime: 6148.313 ms

I have several questions about that: 我有几个问题:

  1. Am I correct that there are 554137 rows in the date index? 我是否更正日期索引中有554137行? There are less than 50 date's that should be in there. 那里应该有不到50个日期。
  2. How do I know what index it is using of the three listed? 我怎么知道它列出的三个使用的索引?
  3. The total runtime listed was around 6 seconds, but when I run the query w/o EXPLAIN ANALYZE, it takes around 40 seconds. 列出的总运行时间约为6秒,但是当我运行没有EXPLAIN ANALYZE的查询时,大约需要40秒。

First as Scott Marlowe says the query only takes 6s to run the rest is transfer time. 首先,Scott Marlowe表示查询只需要6秒才能运行,剩下的就是传输时间。 It seems slower without explain analyze because the result is much larger then the ten lines of the explain analyze output and thus takes longer to transfer. 没有解释分析似乎更慢,因为结果比解释分析输出的十行大得多,因此转移需要更长的时间。 If you would turn on logging of queries and you ran this query you would probably find in the log that the query without explain analyze runs even faster (explain analyze slows things down). 如果您打开查询记录并运行此查询,您可能会在日志中发现没有explain analyze的查询运行得更快(解释分析会减慢速度)。 BTW pgadmin is quite slow itself if that is what you are using. BTW pgadmin本身很慢,如果你正在使用它。

As for the number of rows in the date index pg is right. 至于日期索引中的行数pg是对的。 Even if you only have 50 distinct values all rows will be in the index. 即使您只有50个不同的值,所有行也将在索引中。 Ofcourse the btree part itself will only contain the 50 distinct values but under each leaf value it will have a list of all rows for that value. 当然,btree部分本身只包含50个不同的值,但在每个叶子值下,它将包含该值的所有行的列表。 There is of course the special case of an index with a where clause which would only contain the rows matching the where clause but I do not expect you are using that right? 当然有一个带有where子句的索引的特殊情况,它只包含与where子句匹配的行,但我不指望你正在使用它吗?

It is using all indexes listed in the output of explain analyze. 它使用的是解释分析输出中列出的所有索引。 In this case it converts each index into a bitmap having bits sets for each row that matches the criteria for that index scan. 在这种情况下,它将每个索引转换为位图,该位图具有与该索引扫描的条件匹配的每一行的位集。 These three bitmaps can then very quickly be combined to a bitmap containing the result of the combined criteria. 然后可以非常快速地将这三个位图组合成包含组合标准的结果的位图。

If 5.7 seconds is not good enough you can try a multi column index: 如果5.7秒不够好,您可以尝试多列索引:

create index index_name on events(user_name, date, x_event)

I placed user_name first as I guess it is the column with the highest cardinality . 我将user_name放在第一位,因为我猜它是具有最高基数的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM