简体   繁体   English

Postgres不使用索引,即使返回少于5%的行

[英]Postgres does not use an index even less than 5% rows are returned

I have a postgres table with the following structure: 我有一个具有以下结构的postgres表:

+---------+-------------+-------------+----------+---------+---------+
|   id    |  timestamp  |   numvalues |  text1   |  text2  |  text3  |
+---------+-------------+-------------+----------+---------+---------+
|abcd12344|    4124135  |[1,2,53,1241]| apple    | banana  | papaya  |
+---------+-------------+-------------+----------+---------+---------+

id - random alphanumeric value<br>
timestamp - epoch timestamp<br>
numvalues - array of integers<br>
text(n) - text values

The table has about 150 million rows. 该表约有1.5亿行。

I make an inner query to get the nth percentile of the data ordered on one of the values in the array. 我进行内部查询,以获取按数组值之一排序的数据的第n个百分点。 Then, I need to get the averages of several values from the array. 然后,我需要从数组中获取多个值的平均值。 The length of the array is around 31. 数组的长度约为31。

SELECT AVG(COALESCE(numvalues[2], 0))::NUMERIC(10,0), AVG(COALESCE(numvalues[3], 0))::NUMERIC(10,0)...AVG(COALESCE(numvalues[12], 0))::NUMERIC(10,0) 
FROM (SELECT timestamp, numvalues, ntile(100) 
      OVER (ORDER BY numvalues[1]) pval FROM tablename WHERE timestamp >= somevalue and timestamp <= somevalue) innertable 
WHERE pval >= x and pval <= y;

This returns about 7 million rows out of 150 million which is about 5% of the table. 这将返回1.5亿行中的约700万行,约占表的5%。 However, it does not use the index. 但是,它不使用索引。 Running an EXPLAIN ANALYZE shows that it uses a Seq Scan instead, even when enable_seqscan is set to off. 即使将enable_seqscan设置为off,运行EXPLAIN ANALYZE也会显示它使用Seq扫描代替。

However, a similar query : 但是,类似的查询:

SELECT text1, count(distinct(id))
FROM (SELECT timestamp, id, text1, numvalues, ntile(100) 
      OVER (ORDER BY numvalues[1]) pval FROM tablename WHERE timestamp >= somevalue and timestamp <= somevalue) innertable 
WHERE pval >= x and pval <= y GROUP BY text1;

does use the index. 确实使用索引。

The index is on the timestamp column 索引在时间戳列上

The results of EXPLAIN ANALYZE: EXPLAIN ANALYZE的结果:

explain analyze select text1, count(distinct(id)) 
    from (select timestamp, text1, numvalues, id, ntile(100) over (order by numvalues[1]) pval from table where timestamp >= 1431100800 and timestamp <= 1431108000 and numvalues[1] NOTNULL) innertable 
    where pval >= 90 and pval <= 90 group by text1;

                                  QUERY PLAN

--------------------------------------------------------------------------------
GroupAggregate  (cost=4554118.06..4554641.66 rows=1 width=28) (actual time=218641.221..219051.984 rows=20 loops=1)
   ->  Sort  (cost=4554118.06..4554292.59 rows=69812 width=28) (actual time=218640.546..218728.294 rows=71441 loops=1)
         Sort Key: innertable.text1
         Sort Method: quicksort  Memory: 8654kB
         ->  Subquery Scan on innertable  (cost=4094722.75..4548501.27 rows=69812 width=28) (actual time=216502.946..218521.666 rows=71441 loops=1)
           Filter: ((innertable.pval >= 90) AND (innertable.pval <= 90))
           Rows Removed by Filter: 7072674
           ->  WindowAgg  (cost=4094722.75..4339065.03 rows=13962416 width=118) (actual time=202276.333..211374.235 rows=7144115 loops=1)
                 ->  Sort  (cost=4094722.75..4129628.79 rows=13962416 width=118) (actual time=164912.487..190272.316 rows=7144115 loops=1)
                       Sort Key: (table.numvalues[9])
                       Sort Method: external merge  Disk: 1387704kB
                       ->  Index Scan using table_time_idx on table  (cost=0.57..1578710.87 rows=13962416 width=118) (actual time=0.124..141014.505 rows=7144115 loops=1)
                             Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))
                             Filter: (numvalues[1] IS NOT NULL)
                             Rows Removed by Filter: 7090075
 Total runtime: 219340.709 ms
(16 rows)



explain analyze select avg(coalesce(numvalues[9], 0))::NUMERIC(10,0) mean9, avg(coalesce(numvalues[30],0))::NUMERIC(10,0),  avg(coalesce(numvalues[8],0))::NUMERIC(10,0) - avg(coalesce(numvalues[30], 0))::NUMERIC(10,0) mean0, avg(coalesce(numvalues[10],0))::NUMERIC(10,0) - avg(coalesce(numvalues[8], 0))::NUMERIC(10,0) mean1, avg(coalesce(numvalues[14],0))::NUMERIC(10,0) - avg(coalesce(numvalues[10], 0))::NUMERIC(10,0) mean2, avg(coalesce(numvalues[13],0))::NUMERIC(10,0) - avg(coalesce(numvalues[14], 0))::NUMERIC(10,0) mean3, avg(coalesce(numvalues[9],0))::NUMERIC(10,0) - avg(coalesce(numvalues[13], 0))::NUMERIC(10,0) mean4 
    from (select timestamp, id, numvalues, ntile(100) over (order by numvalues[1] ) pval from table where timestamp >= 1431093600.00 and timestamp <= 1431100800.00 and numvalues[9] NOTNULL) innerTable 
    where pval >= 90.00 and pval <= 90.00 ;

QUERY PLAN                                                                        
--------------------------------------------------------------------------------
 Aggregate  (cost=12662077.32..12662077.37 rows=1 width=82) (actual time=650343.769..650343.770 rows=1 loops=1)
   ->  Subquery Scan on innertable  (cost=12634854.54..12661968.84 rows=3615 width=82) (actual time=647745.962..650232.725 rows=71441 loops=1)
         Filter: (((innertable.pval)::numeric >= 90.00) AND ((innertable.pval)::numeric <= 90.00))
         Rows Removed by Filter: 7072674
          ->  WindowAgg  (cost=12634854.54..12647507.88 rows=723048 width=248) (actual time=632388.293..642338.237 rows=7144115 loops=1)
           ->  Sort  (cost=12634854.54..12636662.16 rows=723048 width=248) (actual time=599893.771..617413.102 rows=7144115 loops=1)
                 Sort Key: (table.numvalues[9])
                 Sort Method: external merge  Disk: 3214248kB
                 ->  Seq Scan on req_p0swajch2t  (cost=0.00..12480460.48 rows=723048 width=248) (actual time=0.041..575423.062 rows=7144115 loops=1)
                       Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))
                       Rows Removed by Filter: 138191935
Total runtime: 650785.126 ms
(12 rows)

Can anyone help me out with why Postgres would use an index for one query and not the other? 谁能为我提供帮助,为什么Postgres会对一个查询而不是另一个查询使用索引? Running a VACUUM ANALYZE didn't help either. 运行VACUUM ANALYZE也无济于事。

Is there any way to speed up the queries? 有什么办法可以加快查询速度吗? A query over the whole table would take about 20 - 30 minutes! 整个表的查询大约需要20-30分钟! Partitioning didn't make much difference since queries span multiple partitions and it showed an improvement of only about a couple of minutes when the number of partitions spanned became larger.I have a postgres table with the following structure: 分区没有太大区别,因为查询跨越多个分区,并且当跨越的分区数量变大时,它仅显示了几分钟的改进。我有一个具有以下结构的postgres表:

+---------+-------------+-------------+----------+---------+---------+
|   id    |  timestamp  |   numvalues |  text1   |  text2  |  text3  |
+---------+-------------+-------------+----------+---------+---------+
|abcd12344|    4124135  |[1,2,53,1241]| apple    | banana  | papaya  |
+---------+-------------+-------------+----------+---------+---------+

id - random alphanumeric value<br>
timestamp - epoch timestamp<br>
numvalues - array of integers<br>
text(n) - text values

The table has about 150 million rows. 该表约有1.5亿行。

I make an inner query to get the nth percentile of the data ordered on one of the values in the array. 我进行内部查询,以获取按数组值之一排序的数据的第n个百分点。 Then, I need to get the averages of several values from the array. 然后,我需要从数组中获取多个值的平均值。 The length of the array is around 31. 数组的长度约为31。

SELECT AVG(COALESCE(numvalues[2], 0))::NUMERIC(10,0), AVG(COALESCE(numvalues[3], 0))::NUMERIC(10,0)...AVG(COALESCE(numvalues[12], 0))::NUMERIC(10,0) 
FROM (SELECT timestamp, numvalues, ntile(100) 
      OVER (ORDER BY numvalues[1]) pval FROM tablename WHERE timestamp >= somevalue and timestamp <= somevalue) innertable 
WHERE pval >= x and pval <= y;

This returns about 7 million rows out of 150 million which is about 5% of the table. 这将返回1.5亿行中的约700万行,约占表的5%。 However, it does not use the index. 但是,它不使用索引。 Running an EXPLAIN ANALYZE shows that it uses a Seq Scan instead, even when enable_seqscan is set to off. 即使将enable_seqscan设置为off,运行EXPLAIN ANALYZE也会显示它使用Seq扫描代替。

However, a similar query : 但是,类似的查询:

SELECT text1, count(distinct(id))
FROM (SELECT timestamp, id, text1, numvalues, ntile(100) 
      OVER (ORDER BY numvalues[1]) pval FROM tablename WHERE timestamp >= somevalue and timestamp <= somevalue) innertable 
WHERE pval >= x and pval <= y GROUP BY text1;

does use the index. 确实使用索引。

The index is on the timestamp column 索引在时间戳列上

The results of EXPLAIN ANALYZE: EXPLAIN ANALYZE的结果:

explain analyze select text1, count(distinct(id)) 
    from (select timestamp, text1, numvalues, id, ntile(100) over (order by numvalues[1]) pval from table where timestamp >= 1431100800 and timestamp <= 1431108000 and numvalues[1] NOTNULL) innertable 
    where pval >= 90 and pval <= 90 group by text1;

                                  QUERY PLAN

--------------------------------------------------------------------------------
GroupAggregate  (cost=4554118.06..4554641.66 rows=1 width=28) (actual time=218641.221..219051.984 rows=20 loops=1)
   ->  Sort  (cost=4554118.06..4554292.59 rows=69812 width=28) (actual time=218640.546..218728.294 rows=71441 loops=1)
         Sort Key: innertable.text1
         Sort Method: quicksort  Memory: 8654kB
         ->  Subquery Scan on innertable  (cost=4094722.75..4548501.27 rows=69812 width=28) (actual time=216502.946..218521.666 rows=71441 loops=1)
           Filter: ((innertable.pval >= 90) AND (innertable.pval <= 90))
           Rows Removed by Filter: 7072674
           ->  WindowAgg  (cost=4094722.75..4339065.03 rows=13962416 width=118) (actual time=202276.333..211374.235 rows=7144115 loops=1)
                 ->  Sort  (cost=4094722.75..4129628.79 rows=13962416 width=118) (actual time=164912.487..190272.316 rows=7144115 loops=1)
                       Sort Key: (table.numvalues[9])
                       Sort Method: external merge  Disk: 1387704kB
                       ->  Index Scan using table_time_idx on table  (cost=0.57..1578710.87 rows=13962416 width=118) (actual time=0.124..141014.505 rows=7144115 loops=1)
                             Index Cond: (("timestamp" >= 1431100800) AND ("timestamp" <= 1431108000))
                             Filter: (numvalues[1] IS NOT NULL)
                             Rows Removed by Filter: 7090075
 Total runtime: 219340.709 ms
(16 rows)



explain analyze select avg(coalesce(numvalues[9], 0))::NUMERIC(10,0) mean9, avg(coalesce(numvalues[30],0))::NUMERIC(10,0),  avg(coalesce(numvalues[8],0))::NUMERIC(10,0) - avg(coalesce(numvalues[30], 0))::NUMERIC(10,0) mean0, avg(coalesce(numvalues[10],0))::NUMERIC(10,0) - avg(coalesce(numvalues[8], 0))::NUMERIC(10,0) mean1, avg(coalesce(numvalues[14],0))::NUMERIC(10,0) - avg(coalesce(numvalues[10], 0))::NUMERIC(10,0) mean2, avg(coalesce(numvalues[13],0))::NUMERIC(10,0) - avg(coalesce(numvalues[14], 0))::NUMERIC(10,0) mean3, avg(coalesce(numvalues[9],0))::NUMERIC(10,0) - avg(coalesce(numvalues[13], 0))::NUMERIC(10,0) mean4 
    from (select timestamp, id, numvalues, ntile(100) over (order by numvalues[1] ) pval from table where timestamp >= 1431093600.00 and timestamp <= 1431100800.00 and numvalues[9] NOTNULL) innerTable 
    where pval >= 90.00 and pval <= 90.00 ;

QUERY PLAN                                                                        
--------------------------------------------------------------------------------
 Aggregate  (cost=12662077.32..12662077.37 rows=1 width=82) (actual time=650343.769..650343.770 rows=1 loops=1)
   ->  Subquery Scan on innertable  (cost=12634854.54..12661968.84 rows=3615 width=82) (actual time=647745.962..650232.725 rows=71441 loops=1)
         Filter: (((innertable.pval)::numeric >= 90.00) AND ((innertable.pval)::numeric <= 90.00))
         Rows Removed by Filter: 7072674
          ->  WindowAgg  (cost=12634854.54..12647507.88 rows=723048 width=248) (actual time=632388.293..642338.237 rows=7144115 loops=1)
           ->  Sort  (cost=12634854.54..12636662.16 rows=723048 width=248) (actual time=599893.771..617413.102 rows=7144115 loops=1)
                 Sort Key: (table.numvalues[9])
                 Sort Method: external merge  Disk: 3214248kB
                 ->  Seq Scan on req_p0swajch2t  (cost=0.00..12480460.48 rows=723048 width=248) (actual time=0.041..575423.062 rows=7144115 loops=1)
                       Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))
                       Rows Removed by Filter: 138191935
Total runtime: 650785.126 ms
(12 rows)

Can anyone help me out with why Postgres would use an index for one query and not the other? 谁能为我提供帮助,为什么Postgres会对一个查询而不是另一个查询使用索引? Running a VACUUM ANALYZE didn't help either. 运行VACUUM ANALYZE也无济于事。

Is there any way to speed up the queries? 有什么办法可以加快查询速度吗? A query over the whole table would take about 20 - 30 minutes! 整个表的查询大约需要20-30分钟! Partitioning didn't make much difference since queries span multiple partitions and it showed an improvement of only about a couple of minutes when the number of partitions spanned became larger. 分区没有太大区别,因为查询跨越多个分区,并且当跨越的分区数量变大时,它仅显示了几分钟的改进。

I guess you can't reply in comments so I have to post an answer. 我想您无法在评论中回复,因此我必须发布答案。

Explain analyze showed, that your timestamp columns are compared with numeric values timestamp >= 1431093600.00 and timestamp <= 1431100800.00 and because of that they were cast to numeric: Explain analyze表明,您的时间戳列与数值timestamp >= 1431093600.00 and timestamp <= 1431100800.00 ,因此它们被timestamp >= 1431093600.00 and timestamp <= 1431100800.00为数字:

Filter: ((numvalues[1] IS NOT NULL) AND (("timestamp")::numeric >= 1431100800.00) AND (("timestamp")::numeric <= 1431108000.00))

Investigate why that happened and try to fix it. 调查发生这种情况的原因并尝试解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 postgres:选择外键计数少于值的行 - postgres: select rows where foreign key count less than value 输出有问题。 错误的确切提取次数少于返回的行数? - Trouble with output. Error exact fetch is less than rows returned? Postgres查询需要比预期更长的时间,甚至是字段上的索引 - Postgres query takes longer than expected, even w/ index on fields 如果where子句的cols大于2,Postgres是否使用2-col索引? - Does Postgres use 2-col index if where clause has more than 2 cols? 为什么这个简单的查询不使用 postgres 中的索引? - Why does this simple query not use the index in postgres? 如何从选择查询中选择 7 行或更多行? 即使表返回少于 7 行 - How to select 7 rows or more from select query? even if table is returning less than 7 rows 为什么此子查询导致返回的行与直接查询不同? - Why does this subquery result in different rows returned than the direct query? 即使没有返回行也显示一个值 - displaying a value even if no rows returned 错误是期望的列数少于返回的列数 - Error is Expecting less columns than returned 当行中的时间戳小于或等于某个值时,使用分析函数对一组记录进行分组 - Use analytic functions to group a set of records when timestamps in rows is less or equal than a value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM