PostgreSQL：有沒有辦法使用 JSONB 或 HSTORE 鍵提高 SELECT 查詢的性能？

Question

我有一個包含許多行（數百萬）的大表，其中有一列類型為JSONB / HSTORE ，其中包含許多字段（數百個）。 為了說明，我使用以下較小且不太復雜的表：

-- table with HSTORE column
CREATE TABLE test_hstore (id BIGSERIAL PRIMARY KEY, data HSTORE);
INSERT INTO test_hstore (data)
SELECT hstore(
    '  key_1=>' || trunc(2 * random()) ||
    ', key_2=>' || trunc(2 * random()) ||
    ', key_3=>' || trunc(2 * random()))
FROM generate_series(0, 9999999) i;

-- table with JSONB column
CREATE TABLE test_jsonb (id BIGSERIAL PRIMARY KEY, data JSONB);
INSERT INTO test_jsonb (data)
SELECT (
    '{ "key_1":' || trunc(2 * random()) ||
    ', "key_2":' || trunc(2 * random()) ||
    ', "key_3":' || trunc(2 * random()) || '}')::JSONB
FROM generate_series(0, 9999999) i;

我想在不使用WHERE子句的情況下簡單地SELECT data列中的一個或多個字段。 隨着所選字段數量的增加，我的性能下降：

EXPLAIN ANALYSE
SELECT id FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..213637.56 rows=10000056 width=8) (actual time=0.049..3705.852 rows=10000000 loops=1)
--Planning time: 0.419 ms
--Execution time: 5445.654 ms

EXPLAIN ANALYSE
SELECT data FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..213637.56 rows=10000056 width=56) (actual time=0.083..2424.334 rows=10000000 loops=1)
--Planning time: 0.082 ms
--Execution time: 3856.972 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..238637.70 rows=10000056 width=32) (actual time=0.122..3263.937 rows=10000000 loops=1)
--Planning time: 0.052 ms
--Execution time: 5390.803 ms


EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..263637.84 rows=10000056 width=64) (actual time=0.089..3621.768 rows=10000000 loops=1)
--Planning time: 0.051 ms
--Execution time: 5334.452 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..288637.98 rows=10000056 width=96) (actual time=0.086..4291.111 rows=10000000 loops=1)
--Planning time: 0.067 ms
--Execution time: 6375.229 ms

JSONB列類型的相同趨勢（甚至更明顯）：

EXPLAIN ANALYSE
SELECT id FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..233332.28 rows=9999828 width=8) (actual time=0.028..4009.841 rows=10000000 loops=1)
--Planning time: 0.878 ms
--Execution time: 5867.604 ms

EXPLAIN ANALYSE
SELECT data FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..233332.28 rows=9999828 width=68) (actual time=0.074..2371.212 rows=10000000 loops=1)
--Planning time: 0.061 ms
--Execution time: 3787.308 ms

EXPLAIN ANALYSE
SELECT data->'key_1' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..258331.85 rows=9999828 width=32) (actual time=0.106..4677.026 rows=10000000 loops=1)
--Planning time: 0.066 ms
--Execution time: 6382.469 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..283331.42 rows=9999828 width=64) (actual time=0.094..6888.904 rows=10000000 loops=1)
--Planning time: 0.047 ms
--Execution time: 8593.060 ms

EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..308330.99 rows=9999828 width=96) (actual time=0.173..9567.699 rows=10000000 loops=1)
--Planning time: 0.171 ms
--Execution time: 11262.135 ms

當表包含更多字段時，這變得更加明顯。 有解決方法嗎？

添加GIN INDEX似乎沒有幫助：

CREATE INDEX ix_test_hstore ON test_hstore USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_hstore;
--Seq Scan on test_hstore  (cost=0.00..288637.00 rows=10000000 width=96) (actual time=0.045..4650.447 rows=10000000 loops=1)
--Planning time: 2.100 ms
--Execution time: 6746.631 ms

CREATE INDEX ix_test_jsonb ON test_jsonb USING GIN (data);
EXPLAIN ANALYSE
SELECT data->'key_1', data->'key_2', data->'key_3' FROM test_jsonb;
--Seq Scan on test_jsonb  (cost=0.00..308334.00 rows=10000000 width=96) (actual time=0.149..9807.012 rows=10000000 loops=1)
--Planning time: 0.131 ms
--Execution time: 11739.948 ms

Answer 1

實際上，您無能為力來改善對數據存儲中的一個key或 JSON 數據的property （可以是數組、字符串或數字；這可能是檢索它的原因更多比從hstore檢索它困難）。

如果您需要在 WHERE 子句中使用data->key_1 ，索引可以為您提供幫助，但它不會使從數據中檢索屬性變得更加容易。

如果您總是（或經常）使用某個key_1 ，最好的做法是規范化您的數據並創建一個名為key_1的列。 如果您的數據源使您很容易存儲data ，但存儲key_1並不那么容易，您可以使用觸發器函數（在插入或更新時）從data值填充column key_1 ：

CREATE TABLE test_jsonb 
(
    id BIGSERIAL PRIMARY KEY, 
    data JSONB, 
    key_1 integer
);

CREATE OR REPLACE FUNCTION ins_upd_test_data() 
RETURNS trigger AS
$$
BEGIN
    new.key_1 = (new.data->>'key_1')::integer ;
    RETURN new ;
END ;
$$
LANGUAGE plpgsql VOLATILE LEAKPROOF;

CREATE TRIGGER ins_upd_test_jsonb_trigger 
    BEFORE INSERT OR UPDATE OF data
    ON test_jsonb FOR EACH ROW
    EXECUTE PROCEDURE ins_upd_test_data();

這樣，您可以key_1與檢索id相同的效率檢索key_1 。

PostgreSQL：有沒有辦法使用 JSONB 或 HSTORE 鍵提高 SELECT 查詢的性能？

問題描述

1 個解決方案

解決方案1
1 2017-01-18 18:54:56

PostgreSQL：有沒有辦法使用 JSONB 或 HSTORE 鍵提高 SELECT 查詢的性能？

問題描述

1 個解決方案

解決方案1 1 2017-01-18 18:54:56

解決方案1
1 2017-01-18 18:54:56