简体   繁体   中英

How to speed up SUM query in postgres on large table

The problem

I'm trying to run the following query on a SQL view in a postgres database:

SELECT sum(value) FROM invoices_view;

The invoices_view has approximately 45 million rows, the data size of the entire database is 40.5 GB and the database has 61 GB of RAM.

Currently this query is taking 4.5 seconds, and I'd like it to be ideally under 1 second.

Things I've tried

I cannot add indexes directly to the SQL view of course, but have an index on the underlying table:

CREATE INDEX invoices_on_value_idx ON invoices (value);

I have also run a VACUUM ANALYZE on the invoices table.

EXPLAIN ANALYZE

The output of EXPLAIN ANALYZE is as follows:

EXPLAIN (ANALYZE, BUFFERS) SELECT sum(value) FROM invoices_view;
Finalize Aggregate  (cost=1514195.47..1514195.47 rows=1 width=32) (actual time=5102.805..5102.806 rows=1 loops=1)
  Buffers: shared hit=14996 read=1446679
  I/O Timings: read=3235.147
  ->  Gather  (cost=1514195.16..1514195.47 rows=3 width=32) (actual time=5102.716..5109.229 rows=4 loops=1)
        Workers Planned: 3
        Workers Launched: 3
        Buffers: shared hit=14996 read=1446679
        I/O Timings: read=3235.147
        ->  Partial Aggregate  (cost=1513195.16..1513195.17 rows=1 width=32) (actual time=5097.626..5097.626 rows=1 loops=4)
              Buffers: shared hit=14996 read=1446679
              I/O Timings: read=3235.147
              ->  Parallel Seq Scan on invoices  (cost=0.00..1505835.14 rows=14720046 width=6) (actual time=0.049..3734.495 rows=11408036 loops=4)
                    Buffers: shared hit=14996 read=1446679
                    I/O Timings: read=3235.147
Planning Time: 2.503 ms
Execution Time: 5109.327 ms

Does anyone have any thought on how I might be able to speed this up? Or should I be looking at alternatives to postgres at this point?

More detail

This is the simplest version of the queries I'll need to run over the dataset.

For example, I need to be able to SUM based on user inputs ie additional WHERE clauses and GROUP BYs.

Keeping a running total would solve for this simplest case only.

You should consider using a trigger to keep track of a rolling sum:

CREATE OR REPLACE FUNCTION func_sum_invoice()
RETURNS trigger AS
$BODY$
BEGIN
    UPDATE invoices_sum
    SET total = total + NEW.value;
RETURN NEW;
END;
$BODY$

Then create the trigger using this function:

CREATE TRIGGER sum_invoice
AFTER INSERT ON invoices
FOR EACH ROW
EXECUTE PROCEDURE func_sum_invoice();

Now each insert into the invoices table will fire a trigger which tallies the rolling sum. To obtain that sum, now you need only a single select, which should be very fast:

SELECT total
FROM invoices_sum;

If your table is INSERT-only, there are ways to get your sums (much) faster.

Assuming there is a column with monotonically increasing values (like id or created in your example), create a MATERIALZED VIEW to pre-compute sums older than a (recent) given threshold. And then just add the sum of recent additions to it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM