简体   繁体   中英

postgres query with IN is very slow

I have a table which has an index on (column A, column B). And I'm running a query that looks like this:

SELECT * FROM table WHERE (A, B) IN ((a_1, b_1), (a_2, b_2), ..., (a_5000, b_5000))

This query is very slow ! The plan looks like:

Bitmap Heap Scan on table
  Recheck Cond: (((A = a_1) AND (B = b_1)) OR ((A = a_2) AND (B = b_2)) OR ...
  ->  BitmapOr
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_1) AND (B = b_1))
        ->  Bitmap Index Scan on idx
              Index Cond: ((A = a_2) AND (B = b_2))
        ...(5000 other Bitmax Index Scan)

Instead of doing one index scan with 5000 values, postgres seems to be doing 5000 index scan with one value at a time, which explains why the query is so slow.

Actually it is way faster to do someting like:

SELECT * FROM table WHERE A IN (a_1, ..., a_5000)

fetch the results and then filter on column B inside the app (python).

I'd really prefer to have the results already filtered by postgres with a reasonable running time. Is there a workaround ?

Try joining to a CTE:

with value_list (a,b) as (
  values 
      (a_1, b_1), 
      (a_2, b_2), ..., 
      (a_5000, b_5000) 
)
select *
from table t
  join value_list v on (t.a, t.b) = (v.a, v.b);

(This assumes you have no duplicates in the list of values)

I accidentally put one where condition repeatedly to the end of the query and my query which took almost a minute comes now within 1 second. I am not aware of the reason behind it. But its faster now.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM