JSONB performance degrades as number of keys increase

Question

I am testing the performance of jsonb datatype in postgresql. Each document will have about 1500 keys that are NOT hierarchical. The document is flattened. Here is what the table and document looks like.

create table ztable0
(
   id serial primary key,
   data jsonb
)

Here is a sample document:

{ "0": 301, "90": 23, "61": 4001, "11": 929} ...

As you can see the document does not contain hierarchies and all values are integers. However, Some will be text in the future.

Rows: 86,000
Columns: 2
Keys in document: 1500+

When searching for a particular value of a key or performing a group by the performance is very noticeably slow. This query:

select (data ->> '1')::integer, count(*) from ztable0
group by (data ->> '1')::integer
limit 100

took about 2 seconds to complete. Is there any way to improve performance of jsonb documents.

Answer 1

This is a known issue in 9.4beta2 , please, have a look at this blog post , it contains some details and pointers to the mail threads.

About the issue.

PostgreSQL is using TOAST to store data values, this means that big values (typically round 2kB and more) are stored in the separate special kind of table. And PostgreSQL also tries to compress the data, using it's pglz method (been there for ages). By “tries” it means that before deciding to compress data, first 1k bytes are probed. And if results are not satisfactory, ie compression gives no benefits on the probed data, decision is made not to compress.

So, initial JSONB format stored a table of offsets in the beginning of it's value. And for values with high number of root keys in JSON this resulted in first 1kB (and more) being occupied by offsets. This was a series of distinct data, ie it was not possible to find two adjacent 4-byte sequences that'd be equal. Thus no compression.

Note, that if one would pass over the offset table, the rest of the value is perfectly compressable. So one of the options would be to tell to the pglz code explicitly wether compression is applicable and where to probe for it (especially for the newly introduced data types), but existing infrastructure doesn't supports this.

The fix

So decision was made to change the way data is stored inside the JSONB value, making it more suitable for pglz to compress. Here's a commit message by Tom Lane with the change that implements a new JSONB on-disk format. And despite the format changes, lookup of a random element is still O(1).

It took around a month to be fixed though. As I can see, 9.4beta3 had been already tagged , so you'll be able to re-test this soon, after the official announcement.

Important Note: you'll have to do pg_dump / pg_restore exercise or utilize pg_upgrade tool to switch to 9.4beta3 , as fix for the issue you've identified required changes in the way data is stored, so beta3 is not binary compatible with beta2 .

JSONB performance degrades as number of keys increase

Question

1 answers

solution1
5 ACCPTED 2014-10-08 15:11:14

About the issue.

The fix

JSONB performance degrades as number of keys increase

Question

1 answers

solution1 5 ACCPTED 2014-10-08 15:11:14

About the issue.

The fix

solution1
5 ACCPTED 2014-10-08 15:11:14