[英]PostgreSQL: Is there a way to improve performance of SELECT queries using JSONB or HSTORE keys?
I have a large table with many rows (millions) with a column of type JSONB
/ HSTORE
, which contains many fields (hundreds).我有一个包含许多行(数百万)的大表,其中有一列类型为JSONB
/ HSTORE
,其中包含许多字段(数百个)。 For illustration, I use the following smaller and less complex table:为了说明,我使用以下较小且不太复杂的表:
-- table with HSTORE column
CREATE TABLE test_hstore (id BIGSERIAL PRIMARY KEY, data HSTORE);
INSERT INTO test_hstore (data)
SELECT hstore(
' key_1=>' || trunc(2 * random()) ||
', key_2=>' || trunc(2 * random()) ||
', key_3=>' || trunc(2 * random()))
FROM generate_series(0, 9999999) i;
-- table with JSONB column
CREATE TABLE test_jsonb (id BIGSERIAL PRIMARY KEY, data JSONB);
INSERT INTO test_jsonb (data)
SELECT (
'{ "key_1":' || trunc(2 * random()) ||
', "key_2":' || trunc(2 * random()) ||
', "key_3":' || trunc(2 * random()) || '}')::JSONB
FROM generate_series(0, 9999999) i;
I would like to simply SELECT
one or more fields within the data
column without using a WHERE
clause.我想在不使用WHERE
子句的情况下简单地SELECT
data
列中的一个或多个字段。 I get a decrease in performance with an increasing number of selected fields:随着所选字段数量的增加,我的性能下降:
EXPLAIN ANALYSE
SELECT id FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..213637.56 rows=10000056 width=8) (actual time=0.049..3705.852 rows=10000000 loops=1)
--Planning time: 0.419 ms
--Execution time: 5445.654 ms
EXPLAIN ANALYSE
SELECT data FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..213637.56 rows=10000056 width=56) (actual time=0.083..2424.334 rows=10000000 loops=1)
--Planning time: 0.082 ms
--Execution time: 3856.972 ms
EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..238637.70 rows=10000056 width=32) (actual time=0.122..3263.937 rows=10000000 loops=1)
--Planning time: 0.052 ms
--Execution time: 5390.803 ms
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..263637.84 rows=10000056 width=64) (actual time=0.089..3621.768 rows=10000000 loops=1)
--Planning time: 0.051 ms
--Execution time: 5334.452 ms
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..288637.98 rows=10000056 width=96) (actual time=0.086..4291.111 rows=10000000 loops=1)
--Planning time: 0.067 ms
--Execution time: 6375.229 ms
Same trend (even more pronounced) for JSONB
column type: JSONB
列类型的相同趋势(甚至更明显):
EXPLAIN ANALYSE
SELECT id FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..233332.28 rows=9999828 width=8) (actual time=0.028..4009.841 rows=10000000 loops=1)
--Planning time: 0.878 ms
--Execution time: 5867.604 ms
EXPLAIN ANALYSE
SELECT data FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..233332.28 rows=9999828 width=68) (actual time=0.074..2371.212 rows=10000000 loops=1)
--Planning time: 0.061 ms
--Execution time: 3787.308 ms
EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..258331.85 rows=9999828 width=32) (actual time=0.106..4677.026 rows=10000000 loops=1)
--Planning time: 0.066 ms
--Execution time: 6382.469 ms
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..283331.42 rows=9999828 width=64) (actual time=0.094..6888.904 rows=10000000 loops=1)
--Planning time: 0.047 ms
--Execution time: 8593.060 ms
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..308330.99 rows=9999828 width=96) (actual time=0.173..9567.699 rows=10000000 loops=1)
--Planning time: 0.171 ms
--Execution time: 11262.135 ms
This becomes even more pronounced when the table contains many more fields.当表包含更多字段时,这变得更加明显。 Is there a workaround?有解决方法吗?
Adding a GIN INDEX
doesn't seem to help:添加GIN INDEX
似乎没有帮助:
CREATE INDEX ix_test_hstore ON test_hstore USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore (cost=0.00..288637.00 rows=10000000 width=96) (actual time=0.045..4650.447 rows=10000000 loops=1)
--Planning time: 2.100 ms
--Execution time: 6746.631 ms
CREATE INDEX ix_test_jsonb ON test_jsonb USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb (cost=0.00..308334.00 rows=10000000 width=96) (actual time=0.149..9807.012 rows=10000000 loops=1)
--Planning time: 0.131 ms
--Execution time: 11739.948 ms
There's actually not much you can do to improve access to one key
within a data store, or a property
of a JSON piece of data (which could be an array , or a string or number ; which might be the reason why retrieving it is more difficult than retrieving it from an hstore
).实际上,您无能为力来改善对数据存储中的一个key
或 JSON 数据的property
(可以是数组、字符串或数字;这可能是检索它的原因更多比从hstore
检索它困难)。
An index could help you if you need to use data->key_1
in a WHERE clause, but it will not make it any easier to retrieve the property from the data.如果您需要在 WHERE 子句中使用data->key_1
,索引可以为您提供帮助,但它不会使从数据中检索属性变得更加容易。
The best course of action, if you always (or frequently) use a certain key_1
, would be to normalise your data and make a column named key_1
.如果您总是(或经常)使用某个key_1
,最好的做法是规范化您的数据并创建一个名为key_1
的列。 If your data source makes it very easy for you to store data
, but not so easy to store key_1
, you could have a trigger function take care (on insert or update) to populate the column key_1
from the value of data
:如果您的数据源使您很容易存储data
,但存储key_1
并不那么容易,您可以使用触发器函数(在插入或更新时)从data
值填充column key_1
:
CREATE TABLE test_jsonb
(
id BIGSERIAL PRIMARY KEY,
data JSONB,
key_1 integer
);
CREATE OR REPLACE FUNCTION ins_upd_test_data()
RETURNS trigger AS
$$
BEGIN
new.key_1 = (new.data->>'key_1')::integer ;
RETURN new ;
END ;
$$
LANGUAGE plpgsql VOLATILE LEAKPROOF;
CREATE TRIGGER ins_upd_test_jsonb_trigger
BEFORE INSERT OR UPDATE OF data
ON test_jsonb FOR EACH ROW
EXECUTE PROCEDURE ins_upd_test_data();
This way, you can retrieve key_1
with the same efficiency that you can retrieve id
.这样,您可以key_1
与检索id
相同的效率检索key_1
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.