简体   繁体   中英

How to speed up postgres query on large table

I have a table with ~1.4 millions rows. There are about 5 columns with general info on each row and a 6th column with ~1700 JSON key value pairs.

I am building some summaries from a column called ownership by selecting rows where a specific key value exists. The query below runs in 14.5s

SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
group by ownership

My queries will be much larger and I know I'll need to make selections on many key values from the jsonfield. For example, if add another key value, the query time increased to 22.9s

SELECT ownership,
SUM (TO_NUMBER(jsonfield->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonfield->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonfield->>'firstvalue' IS NOT NULL
OR
jsonfield->>'secondvalue' IS NOT NULL
group by ownership

There may be instances where I'll need to query on several hundred potential values in the jsonfield. Any suggestions on how to optimize my queries which may speed things up?

Great answer below.. As an FYI, I had to convert my json to jsonb like this before I could create the index. I first created a copy of the json column called jsonbsummary that I then converted to jsonb

ALTER TABLE mytable
  ALTER COLUMN jsonbsummary
  SET DATA TYPE jsonb
  USING jsonbsummary::jsonb;

As an additional FYI - Those queries with grouping that originally took 22+ seconds now run in 200ms with the GIN index! See below

SELECT ownership,
SUM (TO_NUMBER(jsonbsummary->>'firstvalue','9G999g999')) AS total,
SUM (TO_NUMBER(jsonbsummary->>'secondvalue','9G999g999')) AS totaltwo
FROM
mytable
WHERE
jsonbsummary ?| array['firstvalue','secondvalue']
group by ownership

You need a GIN index on the JSONB column.

CREATE INDEX idx_json ON mytable USING GIN (jsoncolumn);

To check for the existence of keys, you need to use the ?| operator which can make use of that index:

select ...
from mytable
where jsoncolumn ?| array['firstvalue', 'secondvalue'];

That is the equivalent to your OR condition. If you want to find rows that contain all of those keys, use the ?& instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM