简体   繁体   中英

Heap vs Clustered index full table scan

I've been googling back and forth for this, and could not get a grasp of how are the table data blocks structured on the disk.

Many resources state that doing a full table scan reads the blocks sequentially (which means that the DB is able to read multiple block at a time), but i couldn't find any resource actually describing how are blocks kept on disk in the case of a heap VS the case of a clustered index.

Heaps do not dictate order, which reasons with the fact that the DB does not care about the order of blocks that it reads from the disk but:

  1. I still didn't find any evidence which guarantees that heap data is stored sequentially on disk
  2. With a clustered index, the order of the results does matter. In that case, i can't understand how can the DB keep blocks sequentially while still keeping the order. Does sequential reads still hold with a clustered index?

Any resource which describes how blocks are laid out on disk in each case , would help

You asked about MySQL, and that generally means the InnoDB storage engine, which is the default.

InnoDB does not store tables as a heap.

InnoDB tables are always stored as a clustered index, where the clustered index is the primary key. A table-scan is therefore more or less equivalent to an index-scan of the clustered index.

Any index in InnoDB is not usually stored sequentially on disk. It's stored as a collection of pages, where page have a uniform size of 16KB. The index is obviously much larger than this, and over time insertions and updates expand parts of the index in the middle as well as at the end. To do this efficiently (that is, without needing to rewrite the whole table), random insertions and updates result in the pages being out of order. New pages created are placed wherever there is room in the file.

To facilitate scanning through all the pages, each page contains links to the location of the next page and the preceding page. These may be quite far away in the file, so a table-scan will not actually be sequential, it will involve many seeks to other locations in the file.

InnoDB requires that pages are loaded into RAM before it can actually use them in queries. The InnoDB buffer pool is a fixed-size allocation of RAM, which contains a set of pages loaded from disk. Once the pages are in the buffer pool, they can be accessed very quickly, and with virtually no overhead for following links. The overhead of reading a page from disk into the buffer pool is orders of magnitude much greater than reading a page once it is in RAM.

So in the case of MySQL:

  • There is no heap
  • Sequential order by clustered index has nothing to do with sequential storage on disk
  • Reads are made to pages in RAM anyway, so the physical layout on disk has little to do with the order pages will be read

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM