[英]How can I improve the performance of my postgresql query?
我有一個查詢,它返回按帳戶分組的時間間隔內的買入、賣出和轉賬的總和,問題是它很慢,我只在過去 24 小時內進行交易,我希望能夠運行這適用於所有交易(2 年內 800,000 筆)。 我該如何優化呢?
select
i.interval, ca.contract_address,
coalesce(SUM(t.amount) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
coalesce(SUM(t.amount) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
coalesce(SUM(t.amount) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
coalesce(SUM(t.supply_percentage) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
from
(
select contract_address
from addresses a
where not exists (select 1 from address_tags at where at.address = a.contract_address and at.tag_id = 3)
) ca
cross join
(
SELECT date_trunc('hour', dd) as interval
FROM generate_series
(
(now() at time zone 'utc') - interval '1 day',
(now() at time zone 'utc'),
'1 hour'::interval
) dd
) i
left join transfers t on (t.from = ca.contract_address or t.to = ca.contract_address) and date_trunc('hour', t.timestamp at time zone 'utc') = i.interval
group by i.interval, ca.contract_address;
示例 output:
interval | contract_address | amount_ampl_bought | amount_ampl_sold | amount_ampl_transferred | percent_ampl_bought | percent_ampl_sold | percent_ampl_transferred
---------------------+--------------------------------------------+--------------------+------------------+-------------------------+-----------------------------+----------------------------+----------------------------
2021-05-08 11:00:00 | 0x0000000000000000000000000000000000000000 | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x000000000000000000000000000000000000dead | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x000000000000006f6502b7f2bbac8c30a3f67e9a | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x000000000000084e91743124a982076c59f10084 | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x0000000000000eb4ec62758aae93400b3e5f7f18 | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x00000000000017c75025d397b91d284bbe8fc7f2 | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x0000000000005117dd3a72e64a705198753fdd54 | 0 | 0 | 0 | 0 | 0 | 0
2021-05-08 11:00:00 | 0x000000000000740a22fa209cf6806d38f7605385 | 0 | 0 | 0 | 0 | 0 | 0
鏈接到可視化查詢:
https://explain.depesz.com/s/SrLf
我在傳輸中創建的索引:
CREATE INDEX transfers_from_to_index ON public.transfers USING btree ("from", "to")
CREATE INDEX transfers_timestamp_index ON public.transfers USING btree ("timestamp")
CREATE INDEX transfers_action_index ON public.transfers USING btree (action)
CREATE UNIQUE INDEX transfers_pkey ON public.transfers USING btree (transaction_hash, log_index)
CREATE INDEX transfers_supply_percentage_index ON public.transfers USING btree (supply_percentage)
CREATE INDEX transfers_amount_index ON public.transfers USING btree (amount)
CREATE INDEX transfers_supply_percentage_timestamp_log_index_index ON public.transfers USING btree (supply_percentage, "timestamp", log_index)
CREATE INDEX transfers_date_trunc_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp")))
CREATE INDEX transfers_to_index ON public.transfers USING btree ("to")
我在地址上創建的索引:
CREATE UNIQUE INDEX addresses_pkey ON public.addresses USING btree (contract_address)
CREATE INDEX addresses_supply_percentage_index ON public.addresses USING btree (supply_percentage)
非常感謝您對此優化的幫助!
我很確定問題是transfers
的JOIN
條件中的or
。 在合理的假設下,您應該能夠將其拆分為兩個單獨的left join
:
select i.interval, a.contract_address,
coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as amount_ampl_bought,
coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as amount_ampl_sold,
coalesce(SUM(tt.amount, tf.amount) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as amount_ampl_transferred,
coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 0), 0) as percent_ampl_bought,
coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 1), 0) as percent_ampl_sold,
coalesce(SUM(tt.supply_percentage, tf.supply_percentage) FILTER (WHERE COALESCE(tt.action, tf.acount) = 2), 0) as percent_ampl_transferred
from addresses a cross join
generate_series(date_trunc('hour', (now() at time zone 'utc') - interval '1 hour'),
date_trunc('hour', now() at time zone 'utc'),
'1 hour'::interval
) i left join
transfers tf
on tf.from = ca.contract_address and
date_trunc('hour', tf.timestamp at time zone 'utc') = i.interval left join
transfers tt
on t.to = ca.contract_address and
date_trunc('hour', tt.timestamp at time zone 'utc') = i.interval
where not exists (select 1
from address_tags at
where at.address = a.contract_address and at.tag_id = 3
)
group by i.interval, ca.contract_address;
然后對於此查詢,您需要以下索引:
address_tags(address, tag_id)
transfers(to, timestamp)
transfers(from, timestamp)
(請注意, to
和from
是非常糟糕的列名稱,因為它們是 SQL 關鍵字。)
timetamp
到 UTC 的轉換也可能會造成問題。 我建議您修復您的數據,以便時間戳都在一個共同的時區中——為此我建議使用 UTC(以避免夏令時問題)。
看起來它已經在所有時間段內完成了大部分工作,只是在完成大部分工作后過濾掉了您沒有要求的工作。 所以如果你想要一個不同的時間段,那就去做吧。 如果這仍然太慢,然后發布計划。 那么至少我們會優化正確的查詢。
你能在下面試一試嗎? AFAIK 沒有理由將所有內容都塞進 1 個查詢中,所以我拆分了其中的一些部分。 我還將or
分成兩部分,它應該可以更好地使用索引。 然后注意到這正是 Gordon 在上面所做的(到目前為止,我認為找到一種可能比 UNION ALL 更快的解決方法非常聰明=)
還添加了 WHERE on action,不確定是否有除 0、1、2 以外的其他值。如果沒有,您可以再次刪除它。
PS:這里未經測試和盲目工作,只是好奇(和充滿希望=)
DROP TABLE IF EXISTS _combined;
WITH intervals
AS (
SELECT i as interval
FROM generate_series(
date_trunc('hour', (now() at time zone 'utc') - interval '1 day'),
date_trunc('hour', (now() at time zone 'utc')),
'1 hour'::interval
) ,
adrs
AS (
SELECT a.contract_address
FROM addresses a
EXCEPT
SELECT at.address
FROM address_tags at
WHERE at.tag_id = 3)
SELECT a.contract_address, i.interval
INTO TEMPORARY TABLE _combined
FROM intervals i
CROSS JOIN adrs a
CREATE UNIQUE INDEX uq_combined ON _combined (interval, contract_address)
SELECT c.interval,
c.contract_address,
COALESCE(SUM(COALESCE(tf.amount , tt.amount , 0)) FILTER (WHERE t.action = 0), 0) as amount_ampl_bought,
COALESCE(SUM(COALESCE(tf.amount , tt.amount , 0)) FILTER (WHERE t.action = 1), 0) as amount_ampl_sold,
COALESCE(SUM(COALESCE(tf.amount , tt.amount , 0)) FILTER (WHERE t.action = 2), 0) as amount_ampl_transferred,
COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 0), 0) as percent_ampl_bought,
COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 1), 0) as percent_ampl_sold,
COALESCE(SUM(COALESCE(tf.supply_percentage, tt.supply_percentage, 0)) FILTER (WHERE t.action = 2), 0) as percent_ampl_transferred
FROM _combined c
LEFT OUTER JOIN transfers tf
ON tf.from = c.contract_address
AND date_trunc('hour', tf.timestamp at time zone 'utc') = c.interval
AND tf.action IN (0, 1, 2)
LEFT OUTER JOIN transfers tt
ON tt.to = c.contract_address
AND date_trunc('hour', tt.timestamp at time zone 'utc') = c.interval
AND tt.action IN (0, 1, 2)
group by c.interval, c.contract_address;
此查詢的理想索引是:
CREATE INDEX transfers_date_trunc_to_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), to)) INCLUDE (action, amount, supply_percentage)
CREATE INDEX transfers_date_trunc_from_idx ON public.transfers USING btree (date_trunc('hour'::text, timezone('utc'::text, "timestamp"), from)) INCLUDE (action, amount, supply_percentage)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.