c++ low level I/O details, reading less than one block

Question

I am dealing with large files and I'd like to improve writing and reading operations.

I need to read a file bigger than 1GB in a sequential manner (at least at the beginning). I wanted to know, does it make sense to calculate the right amount of bytes to read (in order to read a multiple of a block size), or it is the same thing since the read operation is optimized?

what I mean is: as I see it (correct me if I am wrong), if I tell the SO to read 8 bytes, it will read a number of bytes equal to a block size (4KB presumably). Now when I tell the SO to read the subsequent 8 bytes, since it previously read a complete block, it should have it already in cache, right? so it should make no difference if I read a file (in a sequential manner) 8 bytes per time or 4KB per time. Is it right?

Answer 1

Your intuition is right, what you do in userspace will be optimized on several levels.

At the Operating System Level

First of all, if you tell the OS to read 8 bytes at a time, it will employ readahead mechanisms and it will issue to the device read requests in larger chunks. This won't happen for each request, as it would be just a waste of resources, but the OS will employ algorithms to decide whether or not to read a larger chunk.

For instance, on my system the readahead size is 256 sectors, 128KB:

➜  ~ [3] at 21:41:06 [Mon 1] $ sudo blockdev --getra  /dev/sda 
256

The OS might therefore decide to read in 128KB chunks. Consider for example reading a file sequentially with dd, one sector at a time:

➜  ~ [3] at 21:43:23 [Mon 1] $ dd if=bigfile of=/dev/null bs=512

and checking the I/O statistics with iostat:

➜  ~ [3] at 21:44:42 [Mon 1] $ iostat -cxth /dev/sda 1 1000

This samples every second the I/O statistics for 1000 times. Before checking iostat output, it's worth verifying that dd is actually reading 512 bytes at a time.

mguerri-dell ~ [3] at 21:58:11 [Mon 1] $ strace dd if=bigfile of=/dev/null bs=512 count=32
[...]
read(0, "hb\342J\300\371\346\321i\326v\223Ykd\320\211\345X-\202\245\26/K\250\244O?3\346N"..., 512) = 512
[...]

This confirms that dd is reading in 512 bytes chunks. The output of iostat is the following:

12/01/2014 09:46:07 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          24.50    0.00   10.75    0.00    0.00   64.75

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  418.00    0.00 53504.00     0.00   256.00     0.11    0.26    0.26    0.00   0.25  10.60

12/01/2014 09:46:08 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          23.29    0.00   11.14    0.00    0.00   65.57

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  420.00    0.00 53760.00     0.00   256.00     0.11    0.25    0.25    0.00   0.25  10.60

12/01/2014 09:46:09 PM
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          24.13    0.00   11.94    0.00    0.00   63.93

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00  410.00    0.00 52480.00     0.00   256.00     0.10    0.25    0.25    0.00   0.25  10.30

The meaning of the most important fields is the following:

r/s       Number of read requests per second
rkB/s     KBs read per second
avgrq-sz  Average size (in 512 bytes sectors) of the requests sent to the device,
          considering both read and write operations. Since here I am doing 
          mostly read operations, we can ignore the contribution of write operations.

You can check that every second KB read / Number requests is 128KB, namely 256 sectors as shown by avgrq-sz . The OS is therefore reading in 128KB chunks from the device.

The OS won't always employ readahead techniques. Consider just reading a couple of KBs from your file (I flushed the page cache before, making sure I was not reading directly from the OS page cache):

dd if=bigfile  of=/dev/null bs=512 count=8

This is the result I get:

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
sda
                  0.00     0.00    1.00    0.00    16.00     0.00    32.00     0.00    2.00    2.00    0.00   2.00   0.20

It's not possible to show only the requests of a single process with iostat, but you might be able to catch only its activity. In this case I was reading 4KB from the file, and in that moment two read operations with an avgrq-sz of 16KB were issued. The OS is still caching some pages from your file, but it's not reading in 128KB chunks.

At the C++ stdlib Level

In your case, since you are writing C++ code, you have an additional layer between the operating system and your code, the C++ stdlib.

Consider the following example:

#include <iostream>
#include <fstream>

#define BUFF_SIZE 100
#define RD_SZ 8

using namespace std;
int main() {
    char buff[BUFF_SIZE];
    fstream f;
    f.open("bigfile", ios::in | ios::binary );
    f.read(buff, RD_SZ);
    cout << f.gcount() << endl;
    f.read(buff, RD_SZ);
    cout << f.gcount() << endl;
    f.close();
 }

The output is of course:

➜ mguerri-dell ~ [3] at 22:32:03 [Mon 1] $ g++ io.cpp -o io
➜ mguerri-dell ~ [3] at 22:32:04 [Mon 1] $ ./io          
8
8

But strace shows that just one read syscall is issued, reading 8191 bytes.

➜ mguerri-dell ~ [3] at 22:33:22 [Mon 1] $ strace ./io
[...]
open("bigfile", O_RDONLY)               = 3
read(3, "hb\342J\300\371\346\321i\326v\223Ykd\320\211\345X-\202\245\26/K\250\244O?3\346N"..., 8191) = 8191
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 16), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7faecc811000
write(1, "8\n", 28)                      = 2
write(1, "8\n", 28)                      = 2
close(3)
[...]

After the first read, the C++ stdlib has already cached 8KB of data and the second call does not even need to issue a syscall since your data is available in the stdlib buffers. In fact, if the data had not been available, a read syscall would have been issued, but it would have probably hit the OS page cache, avoiding a request to the device.

Having seen how these two caching mechanisms work

I would recommend reading 4KB at a time, to reduce the overhead which comes from even just a single call on a C++ file stream, knowing that the OS and the C++ stdlib will optimize the access to the device.

c++ low level I/O details, reading less than one block

Question

1 answers

solution1
3 ACCPTED 2014-11-11 13:34:19

c++ low level I/O details, reading less than one block

Question

1 answers

solution1 3 ACCPTED 2014-11-11 13:34:19

solution1
3 ACCPTED 2014-11-11 13:34:19