简体   繁体   中英

postgres high CPU usage on after insert trigger

I have an application where I receive a stream of ticks (buys or sells of a commodity) and am trying to generate a table of minutely OHLC (open, high, low, close) columns with this data. The reason I am creating these in a table rather than deriving them from the tick table is due to the high volume of ticks I get (10000000 per day). Using this strategy I can delete all the ticks from the database on a schedule to keep my database size manageable.

My schema is roughly equivalent to this (unnecessary columns remove for brevity).

CREATE TABLE tick (
    executed TIMESTAMP WITH TIME ZONE NOT NULL,
    price NUMERIC
);

CREATE TABLE ohlc_minute (
    created TIMESTAMP WITH TIME ZONE NOT NULL PRIMARY KEY,
    open NUMERIC,
    high NUMERIC,
    low NUMERIC,
    close NUMERIC,
);

My idea was to create an after insert trigger on tick which computes the last minute of OHLC and upserts this into the ohlc_minute table but with this trigger enabled the cpu usage on the database jumps to 100% almost instantly.

CREATE OR REPLACE FUNCTION update_ohlc()
    RETURNS trigger AS
$BODY$
BEGIN
    INSERT INTO ohlc_minute (created, open, high, low, close)
        SELECT
            date_trunc('minute', NEW.executed) executed,
            (array_agg(price ORDER BY executed ASC))[1] as open,
            MAX(price) as high,
            MIN(price) as low,
            (array_agg(price ORDER BY executed DESC))[1] as close
        FROM tick
        WHERE executed BETWEEN date_trunc('minute', NEW.executed) AND date_trunc('minute', NEW.executed) + interval '1 Min'
    ON CONFLICT (created)
    DO UPDATE
    SET open = EXCLUDED.open, high=EXCLUDED.high, low=EXCLUDED.low, close=EXCLUDED.close;
    RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;

CREATE TRIGGER tick_insert
    AFTER INSERT
    ON tick
    FOR EACH ROW 
    EXECUTE PROCEDURE update_ohlc();

One possibly alternative I have is just to run an equivalent function manually on a schedule to update all ohlc bars but I like the idea of always having up to date partial (eg current bar less than one minute) ohlc information available. Is there any easy optimisations I can make to lower the CPU usage of my trigger function?

Are ticks guaranteed to arrive in order? If the insert succeeds, than your aggregation had been done over only one row, so the answer to all aggregations is just the price. If the insert conflicts, then you should be able to compute each value based on the just the existing and the excluded one.

CREATE OR REPLACE FUNCTION update_ohlc()
    RETURNS trigger AS
$BODY$
BEGIN
    INSERT INTO ohlc_minute (created, open, high, low, close)
        values (
            date_trunc('minute', NEW.executed),
            NEW.price,
            NEW.price,
            NEW.price,
            NEW.price 
        )
    ON CONFLICT (created)
    DO UPDATE
    SET high=greatest(ohlc_minute.high,EXCLUDED.high), 
      low=least(ohlc_minute.low,EXCLUDED.low),
      close=EXCLUDED.close;
    RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;

If they are not guaranteed to arrive in order, then I think your current solution would be about optimal, if you insist on having partial results available within the accruing minute.

I solved my own problem, the answer was the obvious not having an index, I created an index

CREATE INDEX IF NOT EXISTS execute_index ON tick (executed);

and CPU usage has fallen to an acceptable level, I would however still be interested to see optimized solutions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM