简体   繁体   中英

Which operating systems, and how, can pin pages in a database buffer pool?

Most relational database construction textbooks talk about the concept of being able to pin a page, ie prevent the operating system from swapping it out of memory. The concept is so that the database software can use it's own buffer replacement algorithm, which might be a better fit than whatever the OS virtual memory policy provides.

It is unclear to me whether typical desktop operating systems actually provide the programmer with the capability to pin pages. The best I can find on OS X, for example, refers to wired pages, but these seem to be only usable by the superuser.

Is the concept of pinning pages, and of defining appropriate buffer replacement strategies that supersede that of the OS, only of theoretical interest and not really implemented by real relational database systems? Or is it the case that typical desktop OS'es (Linux, Windows, OS X) do include hooks for pinning, and typical relational DB software (Oracle, SQL Server, PostgreSQL, MySQL, etc) uses them?

In PostgreSQL, the database server copies the pages from the file (or from the OS, really) into a shared memory segment which PostgreSQL controls. The OS doesn't know what the mapping is between the file system blocks and the shared memory blocks, so the OS couldn't write those pages back out to their disk locations even if it wanted to, until PostgreSQL tells it to do so by issuing a seek and a write .

The OS could decide to swap parts of shared memory out to disk into a swap partition (for example, if it were under severe memory stress), but it can't write them back to their native location on disk since it doesn't know what that location is.

There are ways to tell the OS not to page out certain parts of memory, such as shmctl(shmid,SHM_LOCK,NULL) . But these are mostly intended for security purposes, not performance purposes. For example, you use it to prevent very sensitive information (like the decrypted copy of a private key) from accidentally getting written to swap partitions, from which it might be recovered by the bad guys.

@jjanes is correct to say that the OS can't really write out Pg's shared memory buffer, and can't control what PostgreSQL reads into it, so it doesn't make sense to "pin" it. But that's only half the story.

PostgreSQL does not offer any feature for pinning pages from tables in its shared memory segment. It could do so, and it might arguably be useful, but nobody has implemented it. In most cases the buffer replacement algorithm does a pretty good job by its self.

Partly this is because PostgreSQL relies heavily on the operating system's buffer caches, rather than trying to implement its own. Data might be evicted from shared_buffers , but it's usually still cached in the OS. It's not unreasonable to think of shared_buffers as a first-level cache, and the OS disk cache as the second-level cache.

The features available to control what's kept in the operating system's disk cache are whatever the OS provides. In general, that's not much, because again modern OSes tend to do a better job if you leave them alone and let them manage things themselves.

The idea of manual buffer management, etc, is IMO largely a relic of times when systems had simpler and less effective algorithms for managing caches and buffers automatically.

The main time that automation falls down is if you have something that's used only intermittently, but you want to ensure is available with extremely good response times when it is used; ie you wish to degrade the overall system's throughput to make one part of it more responsive. PostgreSQL doesn't offer much control over that; most people simply ensure that they have something regularly querying the data of interest to keep it warm in the cache.

You could write a relatively simple extension to mmap() a file and mlock() its range, but it'd be pretty wasteful and you'd have to fiddle with the default OS limits designed to stop you from locking too much memory.

(FWIW, I think Oracle offers quite a bit of control over pinning relations, indexes, etc, in tune with its "manually control everything whether you want to or not" philosophy, and it bypasses much of the operating system in the process.)

Speaking for SQL Server (on Windows, obviously), there's an OS setting that allows the SQL engine to ignore requests from the OS in response to memory pressure. That setting is called Lock Pages in Memory (LPIM). That permissions is granted on a per-account basis and needs to be granted to the account running your SQL service when the service is started.

Keep in mind that this isn't always a good idea. For example, in a virtualized environment, the hypervisor communicates its memory needs via a balloon driver process in the guest. If the hypervisor needs more memory, it inflates the memory needs of the balloon in the guest. If your SQL process has LPIM turned on, it won't respond and the hypervisor can start flagging as a result. And if the hypervisor isn't happy, ain't nobody happy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM