PostgreSQL：有没有办法使用 JSONB 或 HSTORE 键提高 SELECT 查询的性能？

Question

I have a large table with many rows (millions) with a column of type JSONB / HSTORE , which contains many fields (hundreds).我有一个包含许多行（数百万）的大表，其中有一列类型为JSONB / HSTORE ，其中包含许多字段（数百个）。 For illustration, I use the following smaller and less complex table:为了说明，我使用以下较小且不太复杂的表：

-- table with HSTORE column
CREATE TABLE test_hstore (id BIGSERIAL PRIMARY KEY, data HSTORE);
INSERT INTO test_hstore (data)
SELECT hstore(
    '  key_1=>' || trunc(2 * random()) ||
    ', key_2=>' || trunc(2 * random()) ||
    ', key_3=>' || trunc(2 * random()))
FROM generate_series(0, 9999999) i;

-- table with JSONB column
CREATE TABLE test_jsonb (id BIGSERIAL PRIMARY KEY, data JSONB);
INSERT INTO test_jsonb (data)
SELECT (
    '{ "key_1":' || trunc(2 * random()) ||
    ', "key_2":' || trunc(2 * random()) ||
    ', "key_3":' || trunc(2 * random()) || '}')::JSONB
FROM generate_series(0, 9999999) i;

I would like to simply SELECT one or more fields within the data column without using a WHERE clause.我想在不使用WHERE子句的情况下简单地SELECT data列中的一个或多个字段。 I get a decrease in performance with an increasing number of selected fields:随着所选字段数量的增加，我的性能下降：

EXPLAIN ANALYSE
SELECT id FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..213637.56 rows=10000056 width=8) (actual time=0.049..3705.852 rows=10000000 loops=1)
--Planning time: 0.419 ms
--Execution time: 5445.654 ms

EXPLAIN ANALYSE
SELECT data FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..213637.56 rows=10000056 width=56) (actual time=0.083..2424.334 rows=10000000 loops=1)
--Planning time: 0.082 ms
--Execution time: 3856.972 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..238637.70 rows=10000056 width=32) (actual time=0.122..3263.937 rows=10000000 loops=1)
--Planning time: 0.052 ms
--Execution time: 5390.803 ms


EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..263637.84 rows=10000056 width=64) (actual time=0.089..3621.768 rows=10000000 loops=1)
--Planning time: 0.051 ms
--Execution time: 5334.452 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..288637.98 rows=10000056 width=96) (actual time=0.086..4291.111 rows=10000000 loops=1)
--Planning time: 0.067 ms
--Execution time: 6375.229 ms

Same trend (even more pronounced) for JSONB column type: JSONB列类型的相同趋势（甚至更明显）：

EXPLAIN ANALYSE
SELECT id FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..233332.28 rows=9999828 width=8) (actual time=0.028..4009.841 rows=10000000 loops=1)
--Planning time: 0.878 ms
--Execution time: 5867.604 ms

EXPLAIN ANALYSE
SELECT data FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..233332.28 rows=9999828 width=68) (actual time=0.074..2371.212 rows=10000000 loops=1)
--Planning time: 0.061 ms
--Execution time: 3787.308 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..258331.85 rows=9999828 width=32) (actual time=0.106..4677.026 rows=10000000 loops=1)
--Planning time: 0.066 ms
--Execution time: 6382.469 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..283331.42 rows=9999828 width=64) (actual time=0.094..6888.904 rows=10000000 loops=1)
--Planning time: 0.047 ms
--Execution time: 8593.060 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..308330.99 rows=9999828 width=96) (actual time=0.173..9567.699 rows=10000000 loops=1)
--Planning time: 0.171 ms
--Execution time: 11262.135 ms

This becomes even more pronounced when the table contains many more fields.当表包含更多字段时，这变得更加明显。 Is there a workaround?有解决方法吗？

Adding a GIN INDEX doesn't seem to help:添加GIN INDEX似乎没有帮助：

CREATE INDEX ix_test_hstore ON test_hstore USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..288637.00 rows=10000000 width=96) (actual time=0.045..4650.447 rows=10000000 loops=1)
--Planning time: 2.100 ms
--Execution time: 6746.631 ms

CREATE INDEX ix_test_jsonb ON test_jsonb USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..308334.00 rows=10000000 width=96) (actual time=0.149..9807.012 rows=10000000 loops=1)
--Planning time: 0.131 ms
--Execution time: 11739.948 ms

Answer 1

There's actually not much you can do to improve access to one key within a data store, or a property of a JSON piece of data (which could be an array , or a string or number ; which might be the reason why retrieving it is more difficult than retrieving it from an hstore ).实际上，您无能为力来改善对数据存储中的一个key或 JSON 数据的property （可以是数组、字符串或数字；这可能是检索它的原因更多比从hstore检索它困难）。

An index could help you if you need to use data->key_1 in a WHERE clause, but it will not make it any easier to retrieve the property from the data.如果您需要在 WHERE 子句中使用data->key_1 ，索引可以为您提供帮助，但它不会使从数据中检索属性变得更加容易。

The best course of action, if you always (or frequently) use a certain key_1 , would be to normalise your data and make a column named key_1 .如果您总是（或经常）使用某个key_1 ，最好的做法是规范化您的数据并创建一个名为key_1的列。 If your data source makes it very easy for you to store data , but not so easy to store key_1 , you could have a trigger function take care (on insert or update) to populate the column key_1 from the value of data :如果您的数据源使您很容易存储data ，但存储key_1并不那么容易，您可以使用触发器函数（在插入或更新时）从data值填充column key_1 ：

CREATE TABLE test_jsonb 
(
    id BIGSERIAL PRIMARY KEY, 
    data JSONB, 
    key_1 integer
);

CREATE OR REPLACE FUNCTION ins_upd_test_data() 
RETURNS trigger AS
$$
BEGIN
    new.key_1 = (new.data->>'key_1')::integer ;
    RETURN new ;
END ;
$$
LANGUAGE plpgsql VOLATILE LEAKPROOF;

CREATE TRIGGER ins_upd_test_jsonb_trigger 
    BEFORE INSERT OR UPDATE OF data
    ON test_jsonb FOR EACH ROW
    EXECUTE PROCEDURE ins_upd_test_data();

This way, you can retrieve key_1 with the same efficiency that you can retrieve id .这样，您可以key_1与检索id相同的效率检索key_1 。

PostgreSQL：有没有办法使用 JSONB 或 HSTORE 键提高 SELECT 查询的性能？

问题描述

1 个解决方案

解决方案1
1 2017-01-18 18:54:56

PostgreSQL：有没有办法使用 JSONB 或 HSTORE 键提高 SELECT 查询的性能？

问题描述

1 个解决方案

解决方案1 1 2017-01-18 18:54:56

解决方案1
1 2017-01-18 18:54:56