Linux中的无缓冲I / O.

Question

I'm writing lots and lots of data that will not be read again for weeks - as my program runs the amount of free memory on the machine (displayed with 'free' or 'top') drops very quickly, the amount of memory my app uses does not increase - neither does the amount of memory used by other processes. 我写的很多很多数据几周都不会再读了 - 因为我的程序运行机器上的可用内存量（以'free'或'top'显示）下降得非常快，内存量我的应用程序使用不会增加 - 其他进程使用的内存量也不会增加。

This leads me to believe the memory is being consumed by the filesystems cache - since I do not intend to read this data for a long time I'm hoping to bypass the systems buffers, such that my data is written directly to disk. 这让我相信文件系统缓存正在消耗内存 - 因为我不打算长时间读取这些数据我希望绕过系统缓冲区，这样我的数据就会直接写入磁盘。 I dont have dreams of improving perf or being a super ninja, my hope is to give a hint to the filesystem that I'm not going to be coming back for this memory any time soon, so dont spend time optimizing for those cases. 我没有改善性能或成为超级忍者的梦想，我的希望是给文件系统一个提示，我不会很快回来为这个记忆，所以不要花时间优化这些情况。

On Windows I've faced similar problems and fixed the problem using FILE_FLAG_NO_BUFFERING|FILE_FLAG_WRITE_THROUGH - the machines memory was not consumed by my app and the machine was more usable in general. 在Windows上，我遇到了类似的问题，并使用FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH解决了问题 - 我的应用程序没有使用机器内存，而且机器通常更有用。 I'm hoping to duplicate the improvements I've seen but on Linux. 我希望能复制我在Linux上看到的改进。 On Windows there is the restriction of writing in sector sized pieces, I'm happy with this restriction for the amount of gain I've measured. 在Windows上有限制写入扇区大小的部分，我很满意这个限制我测量的增益量。

is there a similar way to do this in Linux? 在Linux中有类似的方法吗？

Answer 1

The closest equivalent to the Windows flags you mention I can think of is to open your file with the open(2) flags O_DIRECT | O_SYNC 我能想到的最接近你提到的Windows标志的等价物是用open(2)标志打开你的文件O_DIRECT | O_SYNC O_DIRECT | O_SYNC : O_DIRECT | O_SYNC ：

   O_DIRECT (Since Linux 2.4.10)
          Try to minimize cache effects of the I/O to and from this file.  In
          general this will degrade performance, but it is useful in special
          situations, such as when applications do their own caching.  File I/O
          is done directly to/from user space buffers.  The O_DIRECT flag on its
          own makes at an effort to transfer data synchronously, but does not
          give the guarantees of the O_SYNC that data and necessary metadata are
          transferred.  To guarantee synchronous I/O the O_SYNC must be used in
          addition to O_DIRECT.  See NOTES below for further discussion.

          A semantically similar (but deprecated) interface for block devices is
          described in raw(8).

Granted, trying to do research on this flag to confirm it's what you want I found this interesting piece telling you that unbuffered I/O is a bad idea, Linus describing it as "brain damaged". 当然，试图对这个标志进行研究以确认它是你想要的我发现这个有趣的部分告诉你无缓冲的I / O是一个坏主意，Linus将其描述为“脑损伤”。 According to that you should be using madvise() instead to tell the kernel how to cache pages. 根据你应该使用madvise()来告诉内核如何缓存页面。 YMMV. 因人而异。

Answer 2

You can use O_DIRECT, but in that case you need to do the block IO yourself; 您可以使用O_DIRECT，但在这种情况下，您需要自己执行块IO; you must write in multiples of the FS block size and on block boundaries (it is possible that it is not mandatory but if you do not its performance will suck x1000 because every unaligned write will need a read first). 你必须以FS块大小和块边界的倍数写入（有可能它不是强制性的，但如果你不这样做，它的性能将会吸收x1000，因为每个未对齐的写入都需要先读取）。

Another much less impacting way of stopping your blocks using up the OS cache without using O_DIRECT, is to use posix_fadvise(fd, offset,len, POSIX_FADV_DONTNEED). 在不使用O_DIRECT的情况下，使用操作系统缓存停止块的另一种影响较小的方法是使用posix_fadvise（fd，offset，len，POSIX_FADV_DONTNEED）。 Under Linux 2.6 kernels which support it, this immediately discards (clean) blocks from the cache. 在支持它的Linux 2.6内核下，这会立即从缓存中丢弃（清除）块。 Of course you need to use fdatasync() or such like first, otherwise the blocks may still be dirty and hence won't be cleared from the cache. 当然，您需要首先使用fdatasync（）等，否则块可能仍然是脏的，因此不会从缓存中清除。

It is probably a bad idea of fdatasync() and posix_fadvise( ... POSIX_FADV_DONTNEED) after every write, but instead wait until you've done a reasonable amount (50M, 100M maybe). 在每次写入之后，fdatasync（）和posix_fadvise（... POSIX_FADV_DONTNEED）可能是一个坏主意，而是等到你做了一个合理的数量（50M，100M）。

So in short 所以简而言之

after every (significant chunk) of writes, 在每次（大量的）写入之后，
Call fdatasync followed by posix_fadvise( ... POSIX_FADV_DONTNEED) 调用fdatasync后跟posix_fadvise（... POSIX_FADV_DONTNEED）
This will flush the data to disc and immediately remove them from the OS cache, leaving space for more important things. 这会将数据刷新到光盘并立即将它们从操作系统缓存中删除，为更重要的事情留出空间。

Some users have found that things like fast-growing log files can easily blow "more useful" stuff out of the disc cache, which reduces cache hits a lot on a box which needs to have a lot of read cache, but also writes logs quickly. 一些用户发现像快速增长的日志文件这样的东西很容易从磁盘缓存中吹出“更有用”的东西，这会减少需要大量读取缓存的盒子上的缓存命中率，而且还能快速写入日志。 This is the main motivation for this feature. 这是此功能的主要动机。

However, like any optimisation 但是，像任何优化一样

a) You're not going to need it so a）你不会那么需要它

b) Do not do it (yet) b）不要这样做（还）

Answer 3

as my program runs the amount of free memory on the machine drops very quickly 当我的程序运行时，机器上的可用内存量会很快下降

Why is this a problem? 为什么这是个问题？ Free memory is memory that isn't serving any useful purpose . 可用内存是不能用于任何有用目的的内存。 When it's used to cache data, at least there is a chance it will be useful. 当它用于缓存数据时，至少它有可能是有用的。

If one of your programs requests more memory, file caches will be the first thing to go. 如果你的一个程序请求更多的内存，文件缓存将是第一件事。 Linux knows that it can re-read that data from disk whenever it wants, so it will just reap the memory and give it a new use. Linux知道它可以随时从磁盘重新读取数据，因此它只会收获内存并为其提供新的用途。

It's true that Linux by default waits around 30 seconds (this is what the value used to be anyhow) before flushing writes to disk. 确实，在刷新写入磁盘之前，Linux默认等待大约30秒（这是以前的值）。 You can speed this up with a call to fsync() . 您可以通过调用fsync()来加快速度。 But once the data has been written to disk, there's practically zero cost to keeping a cache of the data in memory. 但是，一旦将数据写入磁盘，将数据缓存保留在内存中的成本几乎为零。

Seeing as you write to the file and don't read from it, Linux will probably guess that this data is the best to throw out, in preference to other cached data. 看到你写入文件并且不读取它，Linux可能会猜测这个数据最好丢弃，而不是其他缓存数据。 So don't waste effort trying to optimise unless you've confirmed that it's a performance problem. 因此，除非您确认这是性能问题，否则不要浪费精力进行优化。

Linux中的无缓冲I / O.

问题描述

3 个解决方案

解决方案1
6 2011-01-16 05:27:26

解决方案2
6 2011-01-16 13:47:14

解决方案3
2 2011-01-16 05:39:38

Linux中的无缓冲I / O.

问题描述

3 个解决方案

解决方案1 6 2011-01-16 05:27:26

解决方案2 6 2011-01-16 13:47:14

解决方案3 2 2011-01-16 05:39:38

解决方案1
6 2011-01-16 05:27:26

解决方案2
6 2011-01-16 13:47:14

解决方案3
2 2011-01-16 05:39:38