简体繁体中英

How does postgres implement a sequential scan?

原文 2020-07-23 18:36:43 6 1 sql/ postgresql

I understand that when the majority of a table is estimated to be required in the result set for a given query, that a sequential scan may be preferred over using an index.

What I'm curious about is how postgres actually reads the pages into memory?

Does it organise them into some kind of ad-hoc in memory index whilst it reads them?

What if the table's too large to fit into memory?

Are there any high level papers on the topic?

(I've done some searching but results are full of blog posts explaining the basics of indexing, not the implementation details of a sequential scan. I expect it's not as straightforward as read into an array when evaluating a join condition over most of a table)

1 answers

What I'm curious about is how postgres actually reads the pages into memory?

The engine reads the whole heap in any order while discarding rows marked as deleted. Hot blocks (already present in the cache) are much faster to process.

Does it organise them into some kind of ad-hoc in memory index whilst it reads them?

No, a sequential scan avoids indexes and reads the heap directly using buffering and the cache.

What if the table's too large to fit into memory?

A sequential scan is pipelined . This means I/O blocks are read as needed. The engine does not need to have the whole heap in memory before it starts processing it. It read a few blocks, then process them and discards them; then it does this again and again until it reads all the blocks of the heap.

Are there any high level papers on the topic?

There should be but, anyway, any good book on query optimization will describe this process in detail.

EDIT For Your Second Question:

What I guess I mean is if you're joining on some random column X, does it have to iterate through each possible row multiple times to find the correct row for each value in the other table, or does it do something more advanced than that?

Well, when you join a couple of tables (or more) the engine query planner produces a plan that includes a "Nested Loop", a "Hash Join", or a "Merge Join" operator. There are more operators but these are the common ones.

The Nested Loop Join retrieves rows for the linked table that match the first one. It could perform an index seek or scan on the related table (ideal) or a full table scan (not ideal).
The Hash Join hashes the secondary table first (incurring in high startup cost) and then joins fast.
The Merge Join sorts both tables by the join key (assuming an equi-join), again incurring in heavy startup cost) and then joins fast (like a zipper).

Index Scan Vs Sequential scan in Postgres

Why does Postgres do a sequential scan where the index would return < 1% of the data?

Sequential scan in postgres is taking surprisingly long. How do I determine the hardware bottleneck?

How do I make postgres avoid doing a double sequential scan for this seek pagination query?

Postgres does a expensive index scan

Postgres select query making sequential scan instead of index scan on table with 18 Million rows

how to quickly mass update sequential numbers in postgres

How to implement Parallelism in Postgres

How to tweak index_scan cost in postgres?

A sequential scan rather than index scan

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Index Scan Vs Sequential scan in Postgres Why does Postgres do a sequential scan where the index would return < 1% of the data? Sequential scan in postgres is taking surprisingly long. How do I determine the hardware bottleneck? How do I make postgres avoid doing a double sequential scan for this seek pagination query? Postgres does a expensive index scan Postgres select query making sequential scan instead of index scan on table with 18 Million rows how to quickly mass update sequential numbers in postgres How to implement Parallelism in Postgres How to tweak index_scan cost in postgres? A sequential scan rather than index scan

Related Tags

How does postgres implement a sequential scan?

Question

1 answers

solution1 1 ACCPTED 2020-07-23 18:47:54

solution1
1 ACCPTED 2020-07-23 18:47:54