简体   繁体   中英

Like search performance with GIN INDEX on JSONB column

I am relatively new to Postgres.

I have a JSONB column with sample data as below.

{
    "book_data": {
        "author": "abcd",
        "title": "This is a literature book",
        "year": "2021",
        "price":2000,
        "noofpages":100,
        "subject": "Language"
    }
}

Here user can search by the value of author , year, price ,subject, title. When user searches by title they can do a partial or full search. So here user can search with entire title ("This is a literature book") or partially (say "literature") We have a GIN index on the column. The query performs well when we do a exact match on the value.

select * FROM book_ms.book_data a
WHERE a.book_details  @@ '$.book_data.author == "abcd" 
&& $.book_data.title == "This is a litreature book"'

But when doing a partial match as below it is taking long time. I will have partial search only on title. For others it will always be exact match. Is there any way i can improve the speed of like search

select * FROM book_ms.book_data a
WHERE a.book_details  @@ '$.book_data.author == "abcd" 
&& $.book_data.title like_regex "litreature"'

Do i have to go for text index for this. If so do we have some thing equivalent to the section group used in oracle text index

Below EXPLAIN (ANALYZE,BUFFERS) - for the like match

"Bitmap Heap Scan on book_data a  (cost=71.57..1784.58 rows=461 width=935) (actual time=63.726..600.494 rows=3430 loops=1)"
"  Recheck Cond: (book_details @@ '($.""book_data"".""author"" == ""abcd"" && $.""book_data"".""title"" like_regex ""litreature"")'::jsonpath)"
"  Rows Removed by Index Recheck: 198778"
"  Heap Blocks: exact=30016"
"  Buffers: shared hit=30241"
"  ->  Bitmap Index Scan on search_ult_text_ndx_3  (cost=0.00..71.45 rows=461 width=0) (actual time=57.008..57.009 rows=202208 loops=1)"
"        Index Cond: (book_details @@ '($.""book_data"".""author"" == ""abcd"" && $.""book_data"".""title"" like_regex ""litreature"")'::jsonpath)"
"        Buffers: shared hit=225"
"Planning Time: 0.216 ms"
"Execution Time: 601.197 ms"
"Note: This is not an Approved plan.  No usable Approved plan was found."
"SQL Hash: -403339798, Plan Hash: -176285915"

Below for the exact match

"Bitmap Heap Scan on book_data a  (cost=135.57..1848.58 rows=461 width=935) (actual time=23.597..23.707 rows=25 loops=1)"
"  Recheck Cond: (book_details @@ '($.""book_data"".""author"" == ""abcd"" && $.""book_data"".""title"" == ""This is a literature book"")'::jsonpath)"
"  Heap Blocks: exact=24"
"  Buffers: shared hit=137 read=17"
"  I/O Timings: read=21.189"
"  ->  Bitmap Index Scan on search_ult_text_ndx_3  (cost=0.00..135.45 rows=461 width=0) (actual time=23.572..23.572 rows=25 loops=1)"
"        Index Cond: (book_details @@ '($.""book_data"".""author"" == ""abcd"" && $.""book_data"".""title"" == ""This is a literature book"")'::jsonpath)"
"        Buffers: shared hit=113 read=17"
"        I/O Timings: read=21.189"
"Planning Time: 0.207 ms"
"Execution Time: 23.732 ms"
"Note: This is not an Approved plan.  No usable Approved plan was found."
"SQL Hash: -403339798, Plan Hash: -176285915"

No index can speed up this WHERE condition.

If the path is always {book_data,author} , you could write:

WHERE book_details -> 'book_data' ->> 'author' LIKE '%litreature%'

Then a trigram index on book_details -> 'book_data' ->> 'author' could be used:

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX ON book_ms.book_data USING gin
   ((book_details -> 'book_data' ->> 'author') gin_trgm_ops);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM