如何提高 postgresql 查询的性能？

Question

I have a query which returns a sum of buys, sells and transfers over an interval grouped by account, the problem is it is quite slow and I am only doing it for transactions in the last 24 hours, I would like to be able to run this for all transactions ever (800,000 over 2 years).我有一个查询，它返回按帐户分组的时间间隔内的买入、卖出和转账的总和，问题是它很慢，我只在过去 24 小时内进行交易，我希望能够运行这适用于所有交易（2 年内 800,000 笔）。 How can I optimise this?我该如何优化呢？

select
    i.interval, ca.contract_address,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
from
    (
        select contract_address
        from addresses a
        where not exists (select 1 from address_tags at where at.address = a.contract_address and at.tag_id = 3)
    ) ca
cross join
    (
        SELECT date_trunc('hour', dd) as interval
        FROM generate_series
        (
            (now() at time zone 'utc') - interval '1 day',
            (now() at time zone 'utc'),
            '1 hour'::interval
        ) dd
    ) i
left join transfers t on (t.from = ca.contract_address or t.to = ca.contract_address) and date_trunc('hour', t.timestamp at time zone 'utc') = i.interval
group by i.interval, ca.contract_address;

Example output:示例 output：

      interval       |              contract_address              | amount_ampl_bought | amount_ampl_sold | amount_ampl_transferred |     percent_ampl_bought     |     percent_ampl_sold      |  percent_ampl_transferred  
---------------------+--------------------------------------------+--------------------+------------------+-------------------------+-----------------------------+----------------------------+----------------------------
 2021-05-08 11:00:00 | 0x0000000000000000000000000000000000000000 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000000000000000000000000000dead |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000006f6502b7f2bbac8c30a3f67e9a |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000084e91743124a982076c59f10084 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000000eb4ec62758aae93400b3e5f7f18 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x00000000000017c75025d397b91d284bbe8fc7f2 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000005117dd3a72e64a705198753fdd54 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000740a22fa209cf6806d38f7605385 |                  0 |                0 |                       0 |                           0 |                          0 |                          0

Link to query visualised:链接到可视化查询：

https://explain.depesz.com/s/SrLf https://explain.depesz.com/s/SrLf

Indexes I have created on transfers:我在传输中创建的索引：

 CREATE INDEX transfers_from_to_index ON public.transfers USING btree ("from", "to")
 CREATE INDEX transfers_timestamp_index ON public.transfers USING btree ("timestamp")
 CREATE INDEX transfers_action_index ON public.transfers USING btree (action)
 CREATE UNIQUE INDEX transfers_pkey ON public.transfers USING btree (transaction_hash, log_index)
 CREATE INDEX transfers_supply_percentage_index ON public.transfers USING btree (supply_percentage)
 CREATE INDEX transfers_amount_index ON public.transfers USING btree (amount)
 CREATE INDEX transfers_supply_percentage_timestamp_log_index_index ON public.transfers USING btree (supply_percentage, "timestamp", log_index)
 CREATE INDEX transfers_date_trunc_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp")))
 CREATE INDEX transfers_to_index ON public.transfers USING btree ("to")

Indexes I have created on addresses:我在地址上创建的索引：

 CREATE UNIQUE INDEX addresses_pkey ON public.addresses USING btree (contract_address)
 CREATE INDEX addresses_supply_percentage_index ON public.addresses USING btree (supply_percentage)

Many thanks for your help with this optimisation!非常感谢您对此优化的帮助！

Answer 1

I am pretty sure the problem is the or in the JOIN condition on transfers .我很确定问题是transfers的JOIN条件中的or 。 Under reasonable assumptions you should be able to split this into two separate left join s:在合理的假设下，您应该能够将其拆分为两个单独的left join ：

select i.interval, a.contract_address,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as amount_ampl_bought,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as amount_ampl_sold,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as amount_ampl_transferred,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as percent_ampl_bought,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as percent_ampl_sold,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as percent_ampl_transferred
from addresses a cross join
     generate_series(date_trunc('hour', (now() at time zone 'utc') - interval '1 hour'),
                     date_trunc('hour', now() at time zone 'utc'),
                     '1 hour'::interval
                    ) i left join
      transfers tf
      on tf.from = ca.contract_address and
         date_trunc('hour', tf.timestamp at time zone 'utc') = i.interval left join
      transfers tt
      on t.to = ca.contract_address and
         date_trunc('hour', tt.timestamp at time zone 'utc') = i.interval
where not exists (select 1
                  from address_tags at
                  where at.address = a.contract_address and at.tag_id = 3
                 )
group by i.interval, ca.contract_address;

Then for this query, you want indexes on:然后对于此查询，您需要以下索引：

address_tags(address, tag_id)
transfers(to, timestamp)
transfers(from, timestamp)

(Note that to and from are really bad names for columns because they are SQL keywords.) （请注意， to和from是非常糟糕的列名称，因为它们是 SQL 关键字。）

The conversion of timetamp to UTC could also pose a problem. timetamp到 UTC 的转换也可能会造成问题。 I would suggest that you fix your data so that timestamps are all in a common timezone -- and I would suggest UTC for that purpose (to avoid issues with daylight saving time).我建议您修复您的数据，以便时间戳都在一个共同的时区中——为此我建议使用 UTC（以避免夏令时问题）。

Answer 2

It looks like it is already doing most of the work for all time periods, just filtering out the ones you didn't ask for after most of the work was done.看起来它已经在所有时间段内完成了大部分工作，只是在完成大部分工作后过滤掉了您没有要求的工作。 So if you want a different time period, just do it.所以如果你想要一个不同的时间段，那就去做吧。 If that is still too slow, then post the plan for that.如果这仍然太慢，然后发布计划。 Then at least we would be optimizing the right query.那么至少我们会优化正确的查询。

Answer 3

Could you give below a try?你能在下面试一试吗？ AFAIK there is no reason to cram everything in 1 query so I split some of the parts of. AFAIK 没有理由将所有内容都塞进 1 个查询中，所以我拆分了其中的一些部分。 I also split the or into 2 parts, it should allow for better usage of the indexes.我还将or分成两部分，它应该可以更好地使用索引。 And then noticed that's exactly what Gordon already did above (so far for me thinking was so clever for finding a workaround that's probably faster than UNION ALL =)然后注意到这正是 Gordon 在上面所做的（到目前为止，我认为找到一种可能比 UNION ALL 更快的解决方法非常聪明=）

Also added WHERE on action, not sure if there are other values than 0, 1, 2. If not you can remove that again.还添加了 WHERE on action，不确定是否有除 0、1、2 以外的其他值。如果没有，您可以再次删除它。

PS: untested and working blind here, simply being curious (and hopeful =) PS：这里未经测试和盲目工作，只是好奇（和充满希望=）

DROP TABLE IF EXISTS _combined;

WITH intervals
  AS ( 
       SELECT i as interval            
          FROM generate_series(
                                date_trunc('hour', (now() at time zone 'utc') - interval '1 day'),
                                date_trunc('hour', (now() at time zone 'utc')),
                                '1 hour'::interval
                            ) ,
     adrs 
  AS (
        SELECT a.contract_address
          FROM addresses a 
        EXCEPT
        SELECT at.address 
          FROM address_tags at
         WHERE at.tag_id = 3)
         
SELECT a.contract_address, i.interval
  INTO TEMPORARY TABLE _combined
  FROM intervals i
 CROSS JOIN adrs a
           
CREATE UNIQUE INDEX uq_combined ON _combined (interval, contract_address)

SELECT c.interval, 
       c.contract_address,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
  FROM _combined c

  LEFT OUTER JOIN transfers tf 
               ON tf.from = c.contract_address  
              AND date_trunc('hour', tf.timestamp at time zone 'utc') = c.interval
              AND tf.action IN (0, 1, 2)

  LEFT OUTER JOIN transfers tt 
               ON tt.to = c.contract_address 
              AND date_trunc('hour', tt.timestamp at time zone 'utc') = c.interval
              AND tt.action IN (0, 1, 2)
       
group by c.interval, c.contract_address;

ideal indexes for this query would be:此查询的理想索引是：

CREATE INDEX transfers_date_trunc_to_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), to)) INCLUDE (action, amount, supply_percentage) 
CREATE INDEX transfers_date_trunc_from_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), from)) INCLUDE (action, amount, supply_percentage)

如何提高 postgresql 查询的性能？

问题描述

3 个解决方案

解决方案1
0 2021-05-08 12:25:27

解决方案2
0 2021-05-08 12:31:14

解决方案3
0 2021-05-14 22:33:39

如何提高 postgresql 查询的性能？

问题描述

3 个解决方案

解决方案1 0 2021-05-08 12:25:27

解决方案2 0 2021-05-08 12:31:14

解决方案3 0 2021-05-14 22:33:39

解决方案1
0 2021-05-08 12:25:27

解决方案2
0 2021-05-08 12:31:14

解决方案3
0 2021-05-14 22:33:39