I need to run a query as cronjob and it is too slow to perform. It sums a value from a view table ( data_time_series
) for each pair of aks
and ii
and inserts a row in storage_time_series
if those values are not present there yet: Here is the query:
INSERT INTO storage_time_series (time, value, akd, ii)
SELECT data.time, data.value, data.akd, data.ii FROM
( SELECT
now() AS time,
SUM(value) AS value,
labels[2] AS akd,
labels[4] AS ii
FROM data_time_series
GROUP BY akd, ii) AS data
WHERE NOT EXISTS (
SELECT 1 FROM storage_time_series AS dst WHERE dst.value = data.value AND dst.access_key_id = data.akd AND dst.instance_id = data.ii );
The first part of the query produces around 200K rows and is fast:
SELECT
now() AS time,
SUM(nullif(value, 'NaN')) AS value,
labels[2] AS akd,
labels[4] AS ii
FROM public.data_time_series
GROUP BY akd, ii
The second part of the query checks if the data exists in storage_time_series
. This part is too slow:
WHERE NOT EXISTS (
SELECT 1 FROM storage_time_series AS dst WHERE dst.value = data.value AND dst.access_key_id = data.akd AND dst.instance_id = data.ii)
storage_time_series
is a huge table and I need to keep the data and akd
and ii
are indexed on storage_time_series
.
Here is the tables schema:
SELECT version() :
PostgreSQL 12.9 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
CREATE TABLE storage_time_series
(
"time" timestamp with time zone,
value double precision,
akd integer,
ii integer
)
CREATE INDEX
ON storage_time_series USING btree
(akd ASC NULLS LAST, ii ASC NULLS LAST, value ASC NULLS LAST)
Number of rows on storage_time_series table: ~190 million data_time_series is a view and has about 1.3 million rows.
Is there any way to speed up this process?
EDITED: I've already tested creating unique index
over akd , ii and value and changed query to simple INSERT and ON CONFLICT DO NOTHING , but seems having unique index
make insert process very slower than before.
akd
andii
are indexed onstorage_time_series
Sounds good. But too vague to say more. If, as it seems, (akd, ii, value)
is supposed to be unique, create a UNIQUE INDEX
accordingly:
CREATE UNIQUE INDEX ON storage_time_series (akd, ii, value);
Then your query burns down to:
INSERT INTO storage_time_series
(akd , ii , value , time )
SELECT labels[2], labels[4], sum(value), now())
FROM data_time_series
GROUP BY 1, 2
ON CONFLICT DO NOTHING;
Which should be fast. Much depends undisclosed details.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.