[英]PostgreSQL - Query Optimization
I have this below query which takes about 15-20 secs to run. 我在下面的查询中有大约15-20秒的运行时间。
with cte0 as ( SELECT label, date, CASE WHEN Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL THEN '1'::numeric ELSE '0'::numeric END As "unique" FROM table_data LEFT JOIN table_mapper ON table_mapper."type" = table_data."type" WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1 ) SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 UNION ALL SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1 UNION ALL SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1
In the table table_data
I have a index on date
column. 在表
table_data
我有一个date
索引列。
CREATE INDEX ix_cli_date ON table_data USING btree (date);
\\d table_data
) \\d table_data
) Table "public.table_data" Column | Type | Modifiers ------------------+------------------------+----------- date | date | not null number | bigint | not null time | time without time zone | not null end time | time without time zone | not null duration | integer | not null time1 | integer | not null time2 | integer | not null time3 | integer | not null time4 | integer | not null time5 | integer | not null time6 | integer | not null time7 | integer | not null type | text | not null name | text | not null id1 | integer | not null id2 | integer | not null key | integer | not null status | text | not null Indexes: "ix_cli_date" btree (date)
Table Definition ( \\d table_mapper
) 表定义(
\\d table_mapper
)
Table "public.table_mapper" Column | Type | Modifiers ------------+------+----------- type | text | not null label | text | not null label2 | text | not null label3 | text | not null label4 | text | not null label5 | text | not null
Result (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)" CTE cte0" -> WindowAgg (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)" -> Sort (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)" Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time"" Sort Method: external merge Disk: 39480kB" -> Hash Left Join (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)" Hash Cond: (table_data."type" = table_mapper."type")" -> Index Scan using ix_cli_date on table_data (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)" Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date Hash (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)" Buckets: 1024 Batches: 1 Memory Usage: 15kB" -> Seq Scan on table_mapper (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)" -> Append (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)" -> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)" Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))" -> Aggregate (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)" Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))" -> Aggregate (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)" -> CTE Scan on cte0 (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)" Filter: (date = (('now'::text)::date - 1))" Total runtime: 25710.506 ms"
Description : 说明:
I'm taking the unique count and repeated count from the table_data
and this where LEAD
helped me out where I give the value 0 for the last repeated value of a column. 我从
table_data
获取唯一计数和重复计数,而这在LEAD
帮助我的地方给出了列的最后重复值0。
Suppose I have 3 x
in a column. 假设我在一列中有3
x
。 I give 1
value to the first 2 x
and the 3rd x
is given 0. 我给前2个
x
赋予1
值,第3个x
赋予0。
Actually through a cte
I'm taking the entire rows from the table table_data
and doing some calculation using the lead and concatinating the strings for a defined date range where each row 1
and 0
value is defined as per the criteria. 实际上通过
cte
我从表中取整行table_data
和做使用引一些计算和concatinating琴弦对,其中每一行定义的时间范围1
和0
值被定义为每标准。
If the lead is null it'll be counted as 1 and if it is not null then 0. 如果线索为null,则将其计为1;如果线索不为null,则将计为0。
And the I return 3 rows MTD
, Current Week
and FTD
respectively with a calculation on taking the sum()
I got from the lead and the count(*)
entire rows. 然后,我分别返回3行
MTD
, Current Week
和FTD
并进行计算,以计算从前导中获得的sum()
和整行中的count(*)
。
For MTD I have the sum and count for the current month. 对于MTD,我具有当月的总数。
For Week - It's the current week and FTD is for yesterday. 对于星期-这是当前星期,而FTD是昨天。
WITH cte AS (
SELECT d.thedate
, lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
ORDER BY d.thetime) AS leader
FROM table_data d
LEFT JOIN table_mapper m USING (type)
WHERE thedate BETWEEN date_trunc('month', current_date - 1)
AND current_date - 1
)
SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM cte
UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1
UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM cte
WHERE thedate = current_date - 1;
The CTE makes sense for big tables, so you only need to scan it once. 对于大型表,CTE很有意义,因此您只需扫描一次即可。 For smaller tables it may be faster without ...
对于较小的表,可能会更快,而无需...
Using thedate
instead of reserved word date
(in standard SQL). 使用
thedate
而不是保留字date
(在标准SQL中)。 thetime
, uni
instead of time
, unique
. thetime
, uni
的,而不是time
, unique
。 Etc. 等等。
Simplified the lead()
call. 简化了
lead()
调用。 You get a value or NULL for the leading row. 您将获得前导行的值或NULL。 That seems the be the only relevant information.
这似乎是唯一相关的信息。
It's also pointless to repeat columns from the PARTITION
clause in the ORDER BY
clause of a window function . 在窗口函数的
ORDER BY
子句中重复PARTITION
子句中的列也是没有意义的。
Building on that, count(leader) / count(*)
instead of sum(uni) / count(uni)
. 在此基础上,使用
count(leader) / count(*)
代替sum(uni) / count(uni)
。 That's a bit faster. 那快一点。
count(column)
only counts non-null values, while count(*)
counts all rows. count(column)
仅计算非空值,而count(*)
计算所有行。
The condition for the first leg of the UNION query was redundant. UNION查询的第一段条件是多余的。
More advice and links about data definition in the comments to the question. 在问题注释中,有关数据定义的更多建议和链接。
You should have primary keys. 您应该有主键。 I suggest
serial
column as surrogate pk for table_data
: 我建议使用
serial
列作为table_data
替代pk:
ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;
Make type
the primary key of table_mapper
(also needed for the following fk constraint): 将
type
作为table_mapper
的主键(以下fk约束也需要):
ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);
Sdd a foreign key constraint for type
to guarantee referential integrity. 为
type
添加外键约束,以确保引用完整性。 Something like: 就像是:
ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey
FOREIGN KEY (type) REFERENCES table_mapper (type)
ON UPDATE CASCADE ON DELETE NO ACTION;
For ultimate read performance (at some cost for writes), add a multi-column index to possibly allow index-only scans for above query: 为了获得最高的读取性能(需要付出一定的写操作成本),请添加多列索引以可能允许上述查询的仅索引扫描 :
CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);
As your query is written, you are referring to the CTE three times. 在编写查询时,您将三次引用CTE。 Instead, you can use conditional aggregation if you are willing to have the values in three columns rather than three rows:
相反,如果您希望将值包含在三列而不是三行中,则可以使用条件聚合:
SELECT round(sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then "unique" else 0 END)) /
sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then 1 else 0 END)) *100,1) as mtd
. . .
FROM CTE
This may speed up the query. 这样可以加快查询速度。 In addition, you could then incorporate this logic into the CTE query itself, eliminating the materialization step as well.
此外,您还可以将此逻辑合并到CTE查询本身中,从而也消除了实现步骤。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.