简体   繁体   English

如何提高 postgresql 查询的性能?

[英]How can I improve the performance of my postgresql query?

I have a query which returns a sum of buys, sells and transfers over an interval grouped by account, the problem is it is quite slow and I am only doing it for transactions in the last 24 hours, I would like to be able to run this for all transactions ever (800,000 over 2 years).我有一个查询,它返回按帐户分组的时间间隔内的买入、卖出和转账的总和,问题是它很慢,我只在过去 24 小时内进行交易,我希望能够运行这适用于所有交易(2 年内 800,000 笔)。 How can I optimise this?我该如何优化呢?

select
    i.interval, ca.contract_address,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
    coalesce(SUM(t.amount) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
    coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
from
    (
        select contract_address
        from addresses a
        where not exists (select 1 from address_tags at where at.address = a.contract_address and at.tag_id = 3)
    ) ca
cross join
    (
        SELECT date_trunc('hour', dd) as interval
        FROM generate_series
        (
            (now() at time zone 'utc') - interval '1 day',
            (now() at time zone 'utc'),
            '1 hour'::interval
        ) dd
    ) i
left join transfers t on (t.from = ca.contract_address or t.to = ca.contract_address) and date_trunc('hour', t.timestamp at time zone 'utc') = i.interval
group by i.interval, ca.contract_address;

Example output:示例 output:

      interval       |              contract_address              | amount_ampl_bought | amount_ampl_sold | amount_ampl_transferred |     percent_ampl_bought     |     percent_ampl_sold      |  percent_ampl_transferred  
---------------------+--------------------------------------------+--------------------+------------------+-------------------------+-----------------------------+----------------------------+----------------------------
 2021-05-08 11:00:00 | 0x0000000000000000000000000000000000000000 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000000000000000000000000000dead |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000006f6502b7f2bbac8c30a3f67e9a |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000084e91743124a982076c59f10084 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000000eb4ec62758aae93400b3e5f7f18 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x00000000000017c75025d397b91d284bbe8fc7f2 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x0000000000005117dd3a72e64a705198753fdd54 |                  0 |                0 |                       0 |                           0 |                          0 |                          0
 2021-05-08 11:00:00 | 0x000000000000740a22fa209cf6806d38f7605385 |                  0 |                0 |                       0 |                           0 |                          0 |                          0

Link to query visualised:链接到可视化查询:

https://explain.depesz.com/s/SrLf https://explain.depesz.com/s/SrLf

Indexes I have created on transfers:我在传输中创建的索引:

 CREATE INDEX transfers_from_to_index ON public.transfers USING btree ("from", "to")
 CREATE INDEX transfers_timestamp_index ON public.transfers USING btree ("timestamp")
 CREATE INDEX transfers_action_index ON public.transfers USING btree (action)
 CREATE UNIQUE INDEX transfers_pkey ON public.transfers USING btree (transaction_hash, log_index)
 CREATE INDEX transfers_supply_percentage_index ON public.transfers USING btree (supply_percentage)
 CREATE INDEX transfers_amount_index ON public.transfers USING btree (amount)
 CREATE INDEX transfers_supply_percentage_timestamp_log_index_index ON public.transfers USING btree (supply_percentage, "timestamp", log_index)
 CREATE INDEX transfers_date_trunc_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp")))
 CREATE INDEX transfers_to_index ON public.transfers USING btree ("to")

Indexes I have created on addresses:我在地址上创建的索引:

 CREATE UNIQUE INDEX addresses_pkey ON public.addresses USING btree (contract_address)
 CREATE INDEX addresses_supply_percentage_index ON public.addresses USING btree (supply_percentage)

Many thanks for your help with this optimisation!非常感谢您对此优化的帮助!

I am pretty sure the problem is the or in the JOIN condition on transfers .我很确定问题是transfersJOIN条件中的or Under reasonable assumptions you should be able to split this into two separate left join s:在合理的假设下,您应该能够将其拆分为两个单独的left join

select i.interval, a.contract_address,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as amount_ampl_bought,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as amount_ampl_sold,
       coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as amount_ampl_transferred,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as percent_ampl_bought,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as percent_ampl_sold,
       coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as percent_ampl_transferred
from addresses a cross join
     generate_series(date_trunc('hour', (now() at time zone 'utc') - interval '1 hour'),
                     date_trunc('hour', now() at time zone 'utc'),
                     '1 hour'::interval
                    ) i left join
      transfers tf
      on tf.from = ca.contract_address and
         date_trunc('hour', tf.timestamp at time zone 'utc') = i.interval left join
      transfers tt
      on t.to = ca.contract_address and
         date_trunc('hour', tt.timestamp at time zone 'utc') = i.interval
where not exists (select 1
                  from address_tags at
                  where at.address = a.contract_address and at.tag_id = 3
                 )
group by i.interval, ca.contract_address;

Then for this query, you want indexes on:然后对于此查询,您需要以下索引:

  • address_tags(address, tag_id)
  • transfers(to, timestamp)
  • transfers(from, timestamp)

(Note that to and from are really bad names for columns because they are SQL keywords.) (请注意, tofrom是非常糟糕的列名称,因为它们是 SQL 关键字。)

The conversion of timetamp to UTC could also pose a problem. timetamp到 UTC 的转换也可能会造成问题。 I would suggest that you fix your data so that timestamps are all in a common timezone -- and I would suggest UTC for that purpose (to avoid issues with daylight saving time).我建议您修复您的数据,以便时间戳都在一个共同的时区中——为此我建议使用 UTC(以避免夏令时问题)。

It looks like it is already doing most of the work for all time periods, just filtering out the ones you didn't ask for after most of the work was done.看起来它已经在所有时间段内完成了大部分工作,只是在完成大部分工作后过滤掉了您没有要求的工作。 So if you want a different time period, just do it.所以如果你想要一个不同的时间段,那就去做吧。 If that is still too slow, then post the plan for that.如果这仍然太慢,然后发布计划。 Then at least we would be optimizing the right query.那么至少我们会优化正确的查询。

Could you give below a try?你能在下面试一试吗? AFAIK there is no reason to cram everything in 1 query so I split some of the parts of. AFAIK 没有理由将所有内容都塞进 1 个查询中,所以我拆分了其中的一些部分。 I also split the or into 2 parts, it should allow for better usage of the indexes.我还将or分成两部分,它应该可以更好地使用索引。 And then noticed that's exactly what Gordon already did above (so far for me thinking was so clever for finding a workaround that's probably faster than UNION ALL =)然后注意到这正是 Gordon 在上面所做的(到目前为止,我认为找到一种可能比 UNION ALL 更快的解决方法非常聪明=)

Also added WHERE on action, not sure if there are other values than 0, 1, 2. If not you can remove that again.还添加了 WHERE on action,不确定是否有除 0、1、2 以外的其他值。如果没有,您可以再次删除它。

PS: untested and working blind here, simply being curious (and hopeful =) PS:这里未经测试和盲目工作,只是好奇(和充满希望=)

DROP TABLE IF EXISTS _combined;

WITH intervals
  AS ( 
       SELECT i as interval            
          FROM generate_series(
                                date_trunc('hour', (now() at time zone 'utc') - interval '1 day'),
                                date_trunc('hour', (now() at time zone 'utc')),
                                '1 hour'::interval
                            ) ,
     adrs 
  AS (
        SELECT a.contract_address
          FROM addresses a 
        EXCEPT
        SELECT at.address 
          FROM address_tags at
         WHERE at.tag_id = 3)
         
SELECT a.contract_address, i.interval
  INTO TEMPORARY TABLE _combined
  FROM intervals i
 CROSS JOIN adrs a
           
CREATE UNIQUE INDEX uq_combined ON _combined (interval, contract_address)

SELECT c.interval, 
       c.contract_address,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
       COALESCE(SUM(COALESCE(tf.amount           , tt.amount           , 0)) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
       COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
  FROM _combined c

  LEFT OUTER JOIN transfers tf 
               ON tf.from = c.contract_address  
              AND date_trunc('hour', tf.timestamp at time zone 'utc') = c.interval
              AND tf.action IN (0, 1, 2)

  LEFT OUTER JOIN transfers tt 
               ON tt.to = c.contract_address 
              AND date_trunc('hour', tt.timestamp at time zone 'utc') = c.interval
              AND tt.action IN (0, 1, 2)
       
group by c.interval, c.contract_address;

ideal indexes for this query would be:此查询的理想索引是:

CREATE INDEX transfers_date_trunc_to_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), to)) INCLUDE (action, amount, supply_percentage) 
CREATE INDEX transfers_date_trunc_from_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), from)) INCLUDE (action, amount, supply_percentage)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM