postgres select large table timeout

Question

I have a table with containing about 1m records. When I run select * from table it will cause timeout and I see the query is in state IO: DataFileRead . When I run the select * from table where id>0 and id<=2147483647 which id is primary key it returns all data in couple of seconds.

Should I always include where clause even for returning all records?

Table schema

CREATE TABLE table
(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    batch_id integer,
    area_id integer,
    asset_group text COLLATE pg_catalog."default",
    asset_id text COLLATE pg_catalog."default",
    parent_id text COLLATE pg_catalog."default",
    reference_key text COLLATE pg_catalog."default",
    maintainer_code text COLLATE pg_catalog."default",
    type_code text COLLATE pg_catalog."default",
    super_type_code text COLLATE pg_catalog."default"
)

The primary key is integer if I specify whole range of integer it returns data quickly but without where it takes one hour. Even if I use column names for example select id,type_code from table it's very slow comparing to select id,type_code from table where id>0 and id<=2147483647

Below is the execution plan without using where:

 Seq Scan on table  (cost=0.00..6894676.46 rows=630746 width=379) (actual time=2590902.656..4068047.762 rows=792777 loops=1)
Planning Time: 0.095 ms
Execution Time: 4068076.818 ms

And when using where:

Bitmap Heap Scan on table  (cost=597265.81..1252327.52 rows=630747 
width=379) (actual time=72.493..211.108 rows=792777 loops=1)

Recheck Cond: ((id > 0) AND (id < 2147483647))
  Heap Blocks: exact=30533
  ->  Bitmap Index Scan on pk_information_model_entry  (cost=0.00..597108.12 rows=630747 width=0) (actual time=64.017..64.017 rows=792777 loops=1)
        Index Cond: ((id > 0) AND (id < 2147483647))
Planning Time: 8.594 ms
Execution Time: 233.207 ms

I'm aware using index can improve it but why using where clause will make such a difference?

Answer 1

Your table seems to be massively bloated (full of totally empty pages). Using the index allows to skip the reading of those pages. You could fix it with a VACUUM FULL of the table, or using something like pg_squeeze.

You might also want to investigate how it got that way in the first place, so you can prevent it from recurring.

To reduce planning time, PostgreSQL doesn't consider using an index unless it "might possibly be useful". But just overcoming extreme bloat is not considered to be "possibly useful", which is why it only uses the index after you introduce a dummy WHERE clause which references the column.

postgres select large table timeout

Question

1 answers

solution1
1 2022-08-17 13:12:00

postgres select large table timeout

Question

1 answers

solution1 1 2022-08-17 13:12:00

solution1
1 2022-08-17 13:12:00