简体   繁体   中英

Any use of buffering for writing data on linux ext4 filesystem?

  1. I am using ext4 on linux 2.6 kernel. I have records in byte arrays, which can range from few hundred to 16MB. Is there any benefit in an application using write() for every record as opposed to saying buffering X MB and then using write() on X MB?

  2. If there is a benefit in buffering, what would be a good value for ext4. This question is for someone who has profiled the behavior of the multiblock allocator in ext4.

  3. My understanding is that filesystem will buffer in multiples of pagesize and attempt to flush them on disk. What happens if the buffer provided to write() is bigger than filesystem buffer? Is this a crude way to force filesystem to flush to disk()

The "correct" answer depends on what you really want to do with the data.

write(2) is designed as single trip into kernel space, and provides good control over I/O. However, unless the file is opened with O_SYNC, the data goes into kernel's cache only, not on disk. O_SYNC changes that to ensure file is synchroinized to disk. The actual writing to disk is issued by kernel cache, and ext4 will try to allocate as big buffer to write to minimize fragmentation, iirc. In general, write(2) with either buffered or O_SYNC file is a good way to control whether the data goes to kernel or whether it's still in your application's cache.

However, for writing lots of records, you might be interested in writev(2) , which writes data from a list of buffers. Similarly to write(2) , it's an atomic call (though of course that's only in OS semantics, not actually on disk, unless, again, Direct I/O is used).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM