简体   繁体   English

尽管使用了索引,但 Postgres 查询速度很慢

[英]Postgres query slow despite index being used

I have the following tables:我有以下表格:

The main lead table with close to 500M rows:接近 500M 行的主要lead表:

create table lead
(
    id                  integer,
    client_id           integer,
    insert_date         integer  (a transformed date that looks like 20201231)
)

create index lead_id_index
    on lead (id);

create index lead_insert_date_index
    on lead (insert_date) include (id, client_id);

create index lead_client_id_index
    on lead (client_id) include (id, insert_date);

And then the other tables然后是其他表

create table last_activity_with_client
(
    lead_id       integer,
    last_activity timestamp,
    last_modified timestamp,
    client_id     integer
);

create index last_activity_with_client_client_id_index
    on last_activity_with_client (client_id) include (lead_id, last_activity);

create index last_activity_with_client_last_activity_index
    on last_activity_with_client (last_activity desc);

create index last_activity_with_client_lead_id_client_id_index
    on last_activity_with_client (lead_id, client_id);


create table lead_last_response_time
(
    lead_id            integer,
    last_response_time timestamp,
    last_modified      timestamp
);

create index lead_last_response_time_last_response_time_index
    on lead_last_response_time (last_response_time desc);

create index lead_last_response_time_lead_id_index
    on lead_last_response_time (lead_id);



create table lead_last_response_time
(
    lead_id            integer,
    last_response_time timestamp,
    last_modified      timestamp
);

create index lead_last_response_time_last_response_time_index
    on lead_last_response_time (last_response_time desc);

create index lead_last_response_time_lead_id_index
    on lead_last_response_time (lead_id);



create table date_dimensions
(
    key                      integer,  (a transformed date that looks like 20201231)
    date                     date,
    description              varchar(256),
    day                      smallint,
    month                    smallint,
    quarter                  char(2),
    year                     smallint
    past_30                  boolean
);

create index date_dimensions_key_index
    on date_dimensions (key);

I try running the following query on different client_id and it is always slowed down by the bitmap index scan on client_id in the lead_table我尝试在不同的client_id上运行以下查询,它总是被lead_table client_id上的位图索引扫描lead_table

EXPLAIN ANALYZE
with TempResult AS (
    select DISTINCT lead.id AS lead_id,
                    last_activity_join.last_activity,
                    lead_last_response_time.last_response_time
    from lead
             left join (select * from last_activity_with_client where client_id = 13189) last_activity_join on
        lead.id = last_activity_join.lead_id

             left join lead_last_response_time lead_last_response_time on
        lead.id = lead_last_response_time.lead_id

             join date_dimensions date_dimensions on
        lead.insert_date = date_dimensions.key

    where (date_dimensions.past_30 = true)
      and (lead.client_id in (13189))
),
     TempCount AS (
         select COUNT(*) as total_rows
         fromt TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

A few results: explain analyze result 2几个结果:解释分析结果2

As you can see, it's using the index but it's quite slow.如您所见,它正在使用索引,但速度很慢。 Always more than 50 seconds.总是超过 50 秒。 What can I do to make this query run faster?我该怎么做才能使此查询运行得更快? I have some freedom to change the query and the tables too.我也可以自由更改查询和表。

create index lead_client_id_index on lead (client_id) include (id, insert_date);

For efficient usage in this query, this should instead be on lead (client_id, insert_date, id) .为了在此查询中有效使用,这应该改为on lead (client_id, insert_date, id) Using the INCLUDE just makes the index less useful, without accomplishing anything.使用 INCLUDE 只会使索引变得不那么有用,而没有完成任何事情。 I think that the only good reasons to use INCLUDE is if the index is unique on a subset of columns, or if the column being INCLUDEd is of a type which doesn't support btree operations.我认为使用 INCLUDE 的唯一理由是索引在列的子集上是唯一的,或者被 INCLUDE 的列的类型不支持 btree 操作。

But even the existing index does seem surprisingly slow.但即使是现有的索引似乎也出奇地慢。 I wonder if there something wrong with it, like fragmentation, or maybe it is sitting on a damaged part of the disk and reads have to retried repeatedly before succeeding.我想知道它是否有问题,例如碎片,或者它可能位于磁盘的损坏部分并且读取必须反复重试才能成功。

Try this:

        EXPLAIN ANALYZE
          with TempResult AS (
                select DISTINCT lead.id AS lead_id,
                last_activity,
                last_response_time 
                from (
                select key 
                from date_dimensions 
                where past_30 = true
                ) date_dimensions
                join (select id, 
                insert_date 
                from lead 
                where client_id = 13189
                ) lead on lead.insert_date = date_dimensions.key
                left join (
                select lead_id, 
                last_activity 
                from last_activity_with_client 
                where client_id = 13189
                ) last_activity_join on lead.id = last_activity_join.lead_id
                left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
    ),
     TempCount AS (
         select COUNT(*) as total_rows
         from TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

or this:或这个:

    EXPLAIN ANALYZE
          with TempResult AS (
                select DISTINCT lead.id AS lead_id,
                last_activity,
                last_response_time 
                from  date_dimensions date_dimensions
                join (select id, 
                insert_date 
                from lead 
                where client_id = 13189
                ) lead on lead.insert_date = date_dimensions.key
                left join (
                select lead_id, 
                last_activity 
                from last_activity_with_client 
                where client_id = 13189
                ) last_activity_join on lead.id = last_activity_join.lead_id
                left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
                where date_dimensions.past_30 = true
    ),
     TempCount AS (
         select COUNT(*) as total_rows
         from TempResult
     )
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM