简体   繁体   中英

Why does my query using LIKE perform a seq scan?

I have a table with 2.2 Million rows.

                                     Table "public.index"
  Column   |            Type             |                      Modifiers                      
-----------+-----------------------------+-----------------------------------------------------
 fid       | integer                     | not null default nextval('index_fid_seq'::regclass)
 location  | character varying           | 
Indexes:
    "index_pkey" PRIMARY KEY, btree (fid)
    "location_index" btree (location text_pattern_ops)

The location is the full path to a file, but I need to query using the name of the folder the file is located in. That folder name is unique in the table.

To avoid % at the beginning, I search for the full path which I know:

select fid from index where location like '/path/to/folder/%'

Explain Analyze:

    QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on index  (cost=0.00..120223.34 rows=217 width=4) (actual time=1181.701..1181.701 rows=0 loops=1)
   Filter: ((location)::text ~~ '/path/to/folder/%'::text)
   Rows Removed by Filter: 2166034
 Planning time: 0.954 ms
 Execution time: 1181.748 ms
(5 rows)

The question is not how to make a workaround, because I have found that for my case:

When creating a foldername_index

create index on index (substring(location, '(?<=/path/to/)[^\/]*');

I can succesfully use the folder_name to query:

explain analyze select fid from index where substring(location, '(?<=/path/to/)[^\/]*') = 'foldername';

    QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on index  (cost=600.49..31524.74 rows=10830 width=12) (actual time=0.030..0.030 rows=1 loops=1)
   Recheck Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
   Heap Blocks: exact=1
   ->  Bitmap Index Scan on foldername_index  (cost=0.00..597.78 rows=10830 width=0) (actual time=0.023..0.023 rows=1 loops=1)
         Index Cond: ("substring"((location)::text, '(?<=/path/to/)[^\/]*'::text) = 'folder_name'::text)
 Planning time: 0.115 ms
 Execution time: 0.059 ms
(7 rows)

I have followed the PostgreSQL FAQ :

When using wild-card operators such as LIKE or ~, indexes can only be used in certain circumstances:

The beginning of the search string must be anchored to the start of the string, ie

LIKE patterns must not start with % or _.

The search string can not start with a character class, eg [ae].

Everyting not the case in my query.

C locale must be used during initdb because sorting in a non-C locale often doesn't match the behavior of LIKE. You can create a special text_pattern_ops index that will work in such cases, but note it is only helpful for LIKE indexing.

I have C Locale:

# show LC_COLLATE;
 lc_collate 
------------
 C
(1 row)

I also followed the instructions from this great answer here on Stack Overflow, which is why I use text_pattern_ops which did not change anything. Unfortunately, I cannot install new modules.

So: Why does my query perform a seq scan?

I have found the solution myself by thinking about it over and over again. Although it might be obvious for some people, it may help others:

/path/to/folder is actually /the_path/to/folder/ (There are underscores in the path). But _ is a wildcard in SQL (like % ).

select fid from index where location like '/the_path/to/folder/%'

Uses seq scan because the index cannot filter any rows, because the part up to the underscore is the same for all rows.

select fid from index where location like '/the\_path/to/folder/%'

uses index scan.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM