简体   繁体   中英

c++ low level I/O details, reading less than one block

I am dealing with large files and I'd like to improve writing and reading operations.

I need to read a file bigger than 1GB in a sequential manner (at least at the beginning). I wanted to know, does it make sense to calculate the right amount of bytes to read (in order to read a multiple of a block size), or it is the same thing since the read operation is optimized?

what I mean is: as I see it (correct me if I am wrong), if I tell the SO to read 8 bytes, it will read a number of bytes equal to a block size (4KB presumably). Now when I tell the SO to read the subsequent 8 bytes, since it previously read a complete block, it should have it already in cache, right? so it should make no difference if I read a file (in a sequential manner) 8 bytes per time or 4KB per time. Is it right?

Your intuition is right, what you do in userspace will be optimized on several levels.

At the Operating System Level

First of all, if you tell the OS to read 8 bytes at a time, it will employ readahead mechanisms and it will issue to the device read requests in larger chunks. This won't happen for each request, as it would be just a waste of resources, but the OS will employ algorithms to decide whether or not to read a larger chunk.

For instance, on my system the readahead size is 256 sectors, 128KB:

➜  ~ [3] at 21:41:06 [Mon 1] $ sudo blockdev --getra  /dev/sda 
256

The OS might therefore decide to read in 128KB chunks. Consider for example reading a file sequentially with dd, one sector at a time:

➜  ~ [3] at 21:43:23 [Mon 1] $ dd if=bigfile of=/dev/null bs=512

and checking the I/O statistics with iostat:

➜  ~ [3] at 21:44:42 [Mon 1] $ iostat -cxth /dev/sda 1 1000

This samples every second the I/O statistics for 1000 times. Before checking iostat output, it's worth verifying that dd is actually reading 512 bytes at a time.

mguerri-dell ~ [3] at 21:58:11 [Mon 1] $ strace dd if=bigfile of=/dev/null bs=512 count=32
[...]
read(0, "hb\342J\300\371\346\321i\326v\223Ykd\320\211\345X-\202\245\26/K\250\244O?3\346N"..., 512) = 512
[...]

This confirms that dd is reading in 512 bytes chunks. The output of iostat is the following:

12/01/2014 09:46:07 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          24.50    0.00   10.75    0.00    0.00   64.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  418.00    0.00 53504.00     0.00   256.00     0.11    0.26    0.26    0.00   0.25  10.60

12/01/2014 09:46:08 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.29    0.00   11.14    0.00    0.00   65.57

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  420.00    0.00 53760.00     0.00   256.00     0.11    0.25    0.25    0.00   0.25  10.60

12/01/2014 09:46:09 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          24.13    0.00   11.94    0.00    0.00   63.93

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  410.00    0.00 52480.00     0.00   256.00     0.10    0.25    0.25    0.00   0.25  10.30

The meaning of the most important fields is the following:

r/s       Number of read requests per second
rkB/s     KBs read per second
avgrq-sz  Average size (in 512 bytes sectors) of the requests sent to the device,
          considering both read and write operations. Since here I am doing 
          mostly read operations, we can ignore the contribution of write operations.

You can check that every second KB read / Number requests is 128KB, namely 256 sectors as shown by avgrq-sz . The OS is therefore reading in 128KB chunks from the device.

The OS won't always employ readahead techniques. Consider just reading a couple of KBs from your file (I flushed the page cache before, making sure I was not reading directly from the OS page cache):

dd if=bigfile  of=/dev/null bs=512 count=8

This is the result I get:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00    1.00    0.00    16.00     0.00    32.00     0.00    2.00    2.00    0.00   2.00   0.20

It's not possible to show only the requests of a single process with iostat, but you might be able to catch only its activity. In this case I was reading 4KB from the file, and in that moment two read operations with an avgrq-sz of 16KB were issued. The OS is still caching some pages from your file, but it's not reading in 128KB chunks.

At the C++ stdlib Level

In your case, since you are writing C++ code, you have an additional layer between the operating system and your code, the C++ stdlib.

Consider the following example:

#include <iostream>
#include <fstream>

#define BUFF_SIZE 100
#define RD_SZ 8

using namespace std;
int main() {
    char buff[BUFF_SIZE];
    fstream f;
    f.open("bigfile", ios::in | ios::binary );
    f.read(buff, RD_SZ);
    cout << f.gcount() << endl;
    f.read(buff, RD_SZ);
    cout << f.gcount() << endl;
    f.close();
 }

The output is of course:

➜ mguerri-dell ~ [3] at 22:32:03 [Mon 1] $ g++ io.cpp -o io
➜ mguerri-dell ~ [3] at 22:32:04 [Mon 1] $ ./io          
8
8

But strace shows that just one read syscall is issued, reading 8191 bytes.

➜ mguerri-dell ~ [3] at 22:33:22 [Mon 1] $ strace ./io
[...]
open("bigfile", O_RDONLY)               = 3
read(3, "hb\342J\300\371\346\321i\326v\223Ykd\320\211\345X-\202\245\26/K\250\244O?3\346N"..., 8191) = 8191
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 16), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faecc811000
write(1, "8\n", 28)                      = 2
write(1, "8\n", 28)                      = 2
close(3)
[...]

After the first read, the C++ stdlib has already cached 8KB of data and the second call does not even need to issue a syscall since your data is available in the stdlib buffers. In fact, if the data had not been available, a read syscall would have been issued, but it would have probably hit the OS page cache, avoiding a request to the device.

Having seen how these two caching mechanisms work

I would recommend reading 4KB at a time, to reduce the overhead which comes from even just a single call on a C++ file stream, knowing that the OS and the C++ stdlib will optimize the access to the device.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM