[英]Postgres query slow despite index being used
I have the following tables:我有以下表格:
The main lead
table with close to 500M rows:接近 500M 行的主要
lead
表:
create table lead
(
id integer,
client_id integer,
insert_date integer (a transformed date that looks like 20201231)
)
create index lead_id_index
on lead (id);
create index lead_insert_date_index
on lead (insert_date) include (id, client_id);
create index lead_client_id_index
on lead (client_id) include (id, insert_date);
And then the other tables然后是其他表
create table last_activity_with_client
(
lead_id integer,
last_activity timestamp,
last_modified timestamp,
client_id integer
);
create index last_activity_with_client_client_id_index
on last_activity_with_client (client_id) include (lead_id, last_activity);
create index last_activity_with_client_last_activity_index
on last_activity_with_client (last_activity desc);
create index last_activity_with_client_lead_id_client_id_index
on last_activity_with_client (lead_id, client_id);
create table lead_last_response_time
(
lead_id integer,
last_response_time timestamp,
last_modified timestamp
);
create index lead_last_response_time_last_response_time_index
on lead_last_response_time (last_response_time desc);
create index lead_last_response_time_lead_id_index
on lead_last_response_time (lead_id);
create table lead_last_response_time
(
lead_id integer,
last_response_time timestamp,
last_modified timestamp
);
create index lead_last_response_time_last_response_time_index
on lead_last_response_time (last_response_time desc);
create index lead_last_response_time_lead_id_index
on lead_last_response_time (lead_id);
create table date_dimensions
(
key integer, (a transformed date that looks like 20201231)
date date,
description varchar(256),
day smallint,
month smallint,
quarter char(2),
year smallint
past_30 boolean
);
create index date_dimensions_key_index
on date_dimensions (key);
I try running the following query on different client_id
and it is always slowed down by the bitmap index scan on client_id
in the lead_table
我尝试在不同的
client_id
上运行以下查询,它总是被lead_table
client_id
上的位图索引扫描lead_table
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity_join.last_activity,
lead_last_response_time.last_response_time
from lead
left join (select * from last_activity_with_client where client_id = 13189) last_activity_join on
lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on
lead.id = lead_last_response_time.lead_id
join date_dimensions date_dimensions on
lead.insert_date = date_dimensions.key
where (date_dimensions.past_30 = true)
and (lead.client_id in (13189))
),
TempCount AS (
select COUNT(*) as total_rows
fromt TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
A few results: explain analyze result 2几个结果:解释分析结果2
As you can see, it's using the index but it's quite slow.如您所见,它正在使用索引,但速度很慢。 Always more than 50 seconds.
总是超过 50 秒。 What can I do to make this query run faster?
我该怎么做才能使此查询运行得更快? I have some freedom to change the query and the tables too.
我也可以自由更改查询和表。
create index lead_client_id_index on lead (client_id) include (id, insert_date);
For efficient usage in this query, this should instead be on lead (client_id, insert_date, id)
.为了在此查询中有效使用,这应该改为
on lead (client_id, insert_date, id)
。 Using the INCLUDE just makes the index less useful, without accomplishing anything.使用 INCLUDE 只会使索引变得不那么有用,而没有完成任何事情。 I think that the only good reasons to use INCLUDE is if the index is unique on a subset of columns, or if the column being INCLUDEd is of a type which doesn't support btree operations.
我认为使用 INCLUDE 的唯一理由是索引在列的子集上是唯一的,或者被 INCLUDE 的列的类型不支持 btree 操作。
But even the existing index does seem surprisingly slow.但即使是现有的索引似乎也出奇地慢。 I wonder if there something wrong with it, like fragmentation, or maybe it is sitting on a damaged part of the disk and reads have to retried repeatedly before succeeding.
我想知道它是否有问题,例如碎片,或者它可能位于磁盘的损坏部分并且读取必须反复重试才能成功。
Try this:
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity,
last_response_time
from (
select key
from date_dimensions
where past_30 = true
) date_dimensions
join (select id,
insert_date
from lead
where client_id = 13189
) lead on lead.insert_date = date_dimensions.key
left join (
select lead_id,
last_activity
from last_activity_with_client
where client_id = 13189
) last_activity_join on lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
),
TempCount AS (
select COUNT(*) as total_rows
from TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
or this:或这个:
EXPLAIN ANALYZE
with TempResult AS (
select DISTINCT lead.id AS lead_id,
last_activity,
last_response_time
from date_dimensions date_dimensions
join (select id,
insert_date
from lead
where client_id = 13189
) lead on lead.insert_date = date_dimensions.key
left join (
select lead_id,
last_activity
from last_activity_with_client
where client_id = 13189
) last_activity_join on lead.id = last_activity_join.lead_id
left join lead_last_response_time lead_last_response_time on lead.id = lead_last_response_time.lead_id
where date_dimensions.past_30 = true
),
TempCount AS (
select COUNT(*) as total_rows
from TempResult
)
select *
from TempResult, TempCount
order by last_response_time desc NULLS LAST
limit 25 offset 1;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.