简体   繁体   中英

Inserting aggregates into big table is too slow

I need to run a query as cronjob and it is too slow to perform. It sums a value from a view table ( data_time_series ) for each pair of aks and ii and inserts a row in storage_time_series if those values are not present there yet: Here is the query:

INSERT INTO storage_time_series (time, value, akd, ii)
SELECT data.time, data.value, data.akd, data.ii FROM
   ( SELECT 
        now() AS time,
        SUM(value) AS value,
        labels[2] AS akd,
        labels[4] AS ii
    FROM data_time_series
    GROUP BY akd, ii) AS data
WHERE NOT EXISTS (
    SELECT 1 FROM storage_time_series AS dst WHERE dst.value = data.value AND dst.access_key_id = data.akd AND dst.instance_id = data.ii );

The first part of the query produces around 200K rows and is fast:

SELECT 
        now() AS time,
        SUM(nullif(value, 'NaN'))  AS value,
        labels[2] AS akd,
        labels[4] AS ii
    FROM public.data_time_series
    GROUP BY akd, ii

The second part of the query checks if the data exists in storage_time_series . This part is too slow:

WHERE NOT EXISTS (
    SELECT 1 FROM storage_time_series AS dst WHERE dst.value = data.value AND dst.access_key_id = data.akd AND dst.instance_id = data.ii)

storage_time_series is a huge table and I need to keep the data and akd and ii are indexed on storage_time_series .

Here is the tables schema:

SELECT version() :
PostgreSQL 12.9 on x86_64-pc-linux-musl, compiled by gcc (Alpine 10.3.1_git20211027) 10.3.1 20211027, 64-bit
CREATE TABLE storage_time_series
(
    "time" timestamp with time zone,
    value double precision,
    akd integer,
    ii integer
)

CREATE  INDEX 
    ON storage_time_series USING btree
    (akd ASC NULLS LAST, ii ASC NULLS LAST, value ASC NULLS LAST)

Number of rows on storage_time_series table: ~190 million data_time_series is a view and has about 1.3 million rows.

Is there any way to speed up this process?

EDITED: I've already tested creating unique index over akd , ii and value and changed query to simple INSERT and ON CONFLICT DO NOTHING , but seems having unique index make insert process very slower than before.

akd and ii are indexed on storage_time_series

Sounds good. But too vague to say more. If, as it seems, (akd, ii, value) is supposed to be unique, create a UNIQUE INDEX accordingly:

CREATE UNIQUE INDEX ON storage_time_series (akd, ii, value);

Then your query burns down to:

INSERT INTO storage_time_series
      (akd      , ii       , value     , time )
SELECT labels[2], labels[4], sum(value), now())
FROM   data_time_series
GROUP  BY 1, 2
ON     CONFLICT DO NOTHING;

Which should be fast. Much depends undisclosed details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM