简体   繁体   中英

How do on-disk databases handle file reading and writing on a file system level?

Suppose I were to write my own database in c++ and suppose I would use a binary tree or a hash map as the underlying datastructure. How would I handle updates to this datastructure?

1) Should I first create the binary tree and then somehow persist it onto a disk? And every time data has to be updated I need to open this file and update it? Wouldn't that be a costly operation?

2) Is there a way to directly work on the binary tree without loading it into memory and then persisting again?

3) How does SQLite and Mysql deal with it?

4) My main question is, how do databases persist huge amounts of data and concurrently make updates to it without opening and closing the file each time.

Databases see the disk or file as one big bock device and manage blocks in M-way Balanced Trees. They insert/update/delete records in these blocks and flush dirty blocks to disk again. They manage allocation tables of free blocks so the database does not need to be rewritten on each access. As RAM memory is expensive but fast, pages are kept in a RAM cache. Separate indexes (either separate files or just blocks) manage quick access based on keys. Blocks are often the native allocation size of the underlying filesystem (eg cluster size). Undo/redo logs are maintained for atomicity. etc.

Much more to be told and this question actually belongs on Computer Science Stack Exchange. For more information read Horowitz & Sahni, "Fundamentals of datastructures", p.496.

As to your questions:

  1. You open it once and keep open while your database manager is running. You allocate storage as needed and maintain an M-way tree as described above.

  2. Yes. You read blocks that you keep in a cache.

  3. and 4: See above.

Typically, you would not do file I/O to access the data. Use mmap to map the data into the virtual address space of the process and let the OS block cache take care of the reads and writes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM