PostgreSQL-查询优化

Question

I have this below query which takes about 15-20 secs to run. 我在下面的查询中有大约15-20秒的运行时间。

with cte0 as (
    SELECT
        label,
        date,
        CASE
            WHEN
                Lead(label || date || "number") OVER (PARTITION BY label || date || "number" ORDER BY "label", "date", "number", "time") IS NULL
            THEN
                '1'::numeric
            ELSE
                '0'::numeric
        END As "unique"
    FROM table_data
    LEFT JOIN table_mapper ON
        table_mapper."type" = table_data."type"
    WHERE Date BETWEEN date_trunc('month', current_date - 1) and current_date - 1
)
SELECT 'MTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1
UNION ALL
SELECT 'Week' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" BETWEEN date_trunc('week', current_date - 1) AND current_date -1
UNION ALL
SELECT 'FTD' as "label", round(sum("unique") / count("unique") *100,1) as "value" FROM cte0 WHERE "date" = current_date -1

In the table table_data I have a index on date column. 在表table_data我有一个date索引列。

CREATE INDEX ix_cli_date
  ON table_data
  USING btree
  (date);

Table Definition ( `\\d table_data` ) 表定义（ `\\d table_data` ）

Table "public.table_data"
      Column      |          Type          | Modifiers
------------------+------------------------+-----------
 date             | date                   | not null
 number           | bigint                 | not null
 time             | time without time zone | not null
 end time         | time without time zone | not null
 duration         | integer                | not null
 time1            | integer                | not null
 time2            | integer                | not null
 time3            | integer                | not null
 time4            | integer                | not null
 time5            | integer                | not null
 time6            | integer                | not null
 time7            | integer                | not null
 type             | text                   | not null
 name             | text                   | not null
 id1              | integer                | not null
 id2              | integer                | not null
 key              | integer                | not null
 status           | text                   | not null
Indexes:
    "ix_cli_date" btree (date)

Table Definition ( \\d table_mapper ) 表定义（ \\d table_mapper ）

Table "public.table_mapper"
   Column   | Type | Modifiers
------------+------+-----------
 type       | text | not null
 label     | text | not null
 label2     | text | not null
 label3     | text | not null
 label4     | text | not null
 label5     | text | not null

EXPLAIN ANALYZE of the query 查询的解释分析

Result  (cost=184342.66..230332.86 rows=3 width=64) (actual time=23377.923..25695.478 rows=3 loops=1)"
  CTE cte0"
    ->  WindowAgg  (cost=121516.06..156751.65 rows=612793 width=23) (actual time=14578.000..18985.958 rows=696157 loops=1)"
          ->  Sort  (cost=121516.06..123048.04 rows=612793 width=23) (actual time=14577.975..17084.405 rows=696157 loops=1)"
                Sort Key: (((table_mapper.label || (table_data.date)::text) || (table_data."number")::text)), table_mapper.label, table_data.date, table_data."number", table_data."time""
                Sort Method: external merge  Disk: 39480kB"
                ->  Hash Left Join  (cost=11.96..37474.21 rows=612793 width=23) (actual time=1.449..3308.718 rows=696157 loops=1)"
                      Hash Cond: (table_data."type" = table_mapper."type")"
                      ->  Index Scan using ix_cli_date on table_data  (cost=0.02..29036.36 rows=612793 width=38) (actual time=0.141..946.648 rows=696157 loops=1)"
                            Index Cond: ((date >= date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)) AND (date   Hash  (cost=7.53..7.53 rows=353 width=25) (actual time=1.275..1.275 rows=336 loops=1)"
                            Buckets: 1024  Batches: 1  Memory Usage: 15kB"
                            ->  Seq Scan on table_mapper  (cost=0.00..7.53 rows=353 width=25) (actual time=0.020..0.589 rows=336 loops=1)"
  ->  Append  (cost=27591.00..73581.21 rows=3 width=64) (actual time=23377.920..25695.467 rows=3 loops=1)"
        ->  Aggregate  (cost=27591.00..27591.02 rows=1 width=32) (actual time=23377.917..23377.918 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..27575.68 rows=3064 width=32) (actual time=14578.052..22335.236 rows=696157 loops=1)"
                    Filter: ((date = date_trunc('month'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
        ->  Aggregate  (cost=27591.00..27591.02 rows=1 width=32) (actual time=1741.509..1741.510 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..27575.68 rows=3064 width=32) (actual time=20.009..1522.352 rows=168261 loops=1)"
                    Filter: ((date = date_trunc('week'::text, ((('now'::text)::date - 1))::timestamp with time zone)))"
        ->  Aggregate  (cost=18399.11..18399.13 rows=1 width=32) (actual time=576.029..576.030 rows=1 loops=1)"
              ->  CTE Scan on cte0  (cost=0.00..18383.79 rows=3064 width=32) (actual time=9.308..546.735 rows=23486 loops=1)"
                    Filter: (date = (('now'::text)::date - 1))"
Total runtime: 25710.506 ms"

Description : 说明：

I'm taking the unique count and repeated count from the table_data and this where LEAD helped me out where I give the value 0 for the last repeated value of a column. 我从table_data获取唯一计数和重复计数，而这在LEAD帮助我的地方给出了列的最后重复值0。

Suppose I have 3 x in a column. 假设我在一列中有3 x 。 I give 1 value to the first 2 x and the 3rd x is given 0. 我给前2个x赋予1值，第3个x赋予0。

Actually through a cte I'm taking the entire rows from the table table_data and doing some calculation using the lead and concatinating the strings for a defined date range where each row 1 and 0 value is defined as per the criteria. 实际上通过cte我从表中取整行table_data和做使用引一些计算和concatinating琴弦对，其中每一行定义的时间范围1和0值被定义为每标准。

If the lead is null it'll be counted as 1 and if it is not null then 0. 如果线索为null，则将其计为1；如果线索不为null，则将计为0。

And the I return 3 rows MTD , Current Week and FTD respectively with a calculation on taking the sum() I got from the lead and the count(*) entire rows. 然后，我分别返回3行MTD ， Current Week和FTD并进行计算，以计算从前导中获得的sum()和整行中的count(*) 。

For MTD I have the sum and count for the current month. 对于MTD，我具有当月的总数。

For Week - It's the current week and FTD is for yesterday. 对于星期-这是当前星期，而FTD是昨天。

Answer 1

WITH cte AS (
   SELECT d.thedate
        , lead(m.label) OVER (PARTITION BY m.label, d.thedate, d.number
                              ORDER BY d.thetime) AS leader
   FROM   table_data d
   LEFT   JOIN table_mapper m USING (type)
   WHERE  thedate BETWEEN date_trunc('month', current_date - 1)
                  AND current_date - 1
   )

SELECT 'MTD' AS label, round(count(leader)::numeric / count(*) * 100, 1) AS val
FROM   cte

UNION ALL
SELECT 'Week', round(count(leader)::numeric / count(*) * 100, 1)
FROM   cte
WHERE  thedate BETWEEN date_trunc('week', current_date - 1) AND current_date - 1

UNION ALL
SELECT 'FTD', round(count(leader)::numeric / count(*) * 100, 1)
FROM   cte
WHERE  thedate = current_date - 1;

The CTE makes sense for big tables, so you only need to scan it once. 对于大型表，CTE很有意义，因此您只需扫描一次即可。 For smaller tables it may be faster without ... 对于较小的表，可能会更快，而无需...
Using thedate instead of reserved word date (in standard SQL). 使用thedate而不是保留字date （在标准SQL中）。 thetime , uni instead of time , unique . thetime ， uni的，而不是time ， unique 。 Etc. 等等。
Simplified the lead() call. 简化了lead()调用。 You get a value or NULL for the leading row. 您将获得前导行的值或NULL。 That seems the be the only relevant information. 这似乎是唯一相关的信息。
It's also pointless to repeat columns from the PARTITION clause in the ORDER BY clause of a window function . 在窗口函数的ORDER BY子句中重复PARTITION子句中的列也是没有意义的。
Building on that, count(leader) / count(*) instead of sum(uni) / count(uni) . 在此基础上，使用count(leader) / count(*)代替sum(uni) / count(uni) 。 That's a bit faster. 那快一点。 count(column) only counts non-null values, while count(*) counts all rows. count(column)仅计算非空值，而count(*)计算所有行。
The condition for the first leg of the UNION query was redundant. UNION查询的第一段条件是多余的。
More advice and links about data definition in the comments to the question. 在问题注释中，有关数据定义的更多建议和链接。

Table design / Indexes 表格设计/索引

You should have primary keys. 您应该有主键。 I suggest serial column as surrogate pk for table_data : 我建议使用serial列作为table_data替代pk：

ALTER TABLE table_data ADD COLUMN table_data_id serial PRIMARY KEY;

Make type the primary key of table_mapper (also needed for the following fk constraint): 将type作为table_mapper的主键（以下fk约束也需要）：

ALTER TABLE table_mapper ADD CONSTRAINT table_mapper_pkey (type);

Sdd a foreign key constraint for type to guarantee referential integrity. 为type添加外键约束，以确保引用完整性。 Something like: 就像是：

ALTER TABLE table_data ADD CONSTRAINT table_data_type_fkey
  FOREIGN KEY (type) REFERENCES table_mapper (type)
  ON UPDATE CASCADE ON DELETE NO ACTION;

For ultimate read performance (at some cost for writes), add a multi-column index to possibly allow index-only scans for above query: 为了获得最高的读取性能（需要付出一定的写操作成本），请添加多列索引以可能允许上述查询的仅索引扫描：

CREATE INDEX table_data_foo_idx ON table_data (thedate, number, thetime);

Answer 2

As your query is written, you are referring to the CTE three times. 在编写查询时，您将三次引用CTE。 Instead, you can use conditional aggregation if you are willing to have the values in three columns rather than three rows: 相反，如果您希望将值包含在三列而不是三行中，则可以使用条件聚合：

SELECT round(sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then "unique" else 0 END)) /
             sum("date" BETWEEN date_trunc('month', current_date - 1) AND current_date -1 then 1 else 0 END)) *100,1) as mtd
     . . .
FROM CTE

This may speed up the query. 这样可以加快查询速度。 In addition, you could then incorporate this logic into the CTE query itself, eliminating the materialization step as well. 此外，您还可以将此逻辑合并到CTE查询本身中，从而也消除了实现步骤。

PostgreSQL-查询优化

问题描述

Table Definition ( `\\d table_data` ) 表定义（ `\\d table_data` ）

EXPLAIN ANALYZE of the query 查询的解释分析

2 个解决方案

解决方案1
1 已采纳 2014-04-28 12:21:34

Table design / Indexes 表格设计/索引

解决方案2
0 2014-04-28 11:36:57

PostgreSQL-查询优化

问题描述

Table Definition ( \\d table_data ) 表定义（ \\d table_data ）

EXPLAIN ANALYZE of the query 查询的解释分析

2 个解决方案

解决方案1 1 已采纳 2014-04-28 12:21:34

Table design / Indexes 表格设计/索引

解决方案2 0 2014-04-28 11:36:57

Table Definition ( `\\d table_data` ) 表定义（ `\\d table_data` ）

解决方案1
1 已采纳 2014-04-28 12:21:34

解决方案2
0 2014-04-28 11:36:57