简体   繁体   中英

The magic of STREAMS in Linux. When to finish?

Today at 5am I read an article about read system call. And things become significantly clear for me.

ssize_t read(int fd, void *buf, size_t count);

The construction of *nix like operation system become amazing in it's simplicity. File interface for any entity, just ask to write some date from this fd interface into some memory by *buf pointer. All the same for network, files, streams.

But some question appears. How to distinguish two cases?: 1) Stream is empty need to wait for new data. 2) Stream is closed need to close program.

Here is a scenario:

  • Reading data from STDIN in loop, this STDIN redirected by pipe .
  • some text_data appears
  • just read bite by bite until what EOF in memory, or 0 as result of read call?
  • How program will understand: wait for a new input, or exit?

This is unclear. In case of endless or continuous streams.

UPD After speak with @Bailey Kocin and reading some docs I have this understanding. Fix me if I'm wrong.

  • read holds the program execution and waits for count amount of bites.
  • When count amount of bites appears read writes it into buf and execution continues.
  • When stream is closed read returns 0 , and it is a signal that program may be finished.

Question Do EOF appears in buf ?

UPD2 EOF is a constant that can be in the output of getc function

 while (ch != EOF)     { 
    /* display contents of file on screen */ 
    putchar(ch);  

    ch = getc(fp);   
 }

But in case of read the EOF value dose not appears in a buf . read system call signalize about file ending by returning 0 . Instead of writing EOF constant into the data-area , as like ak in case of getc .

EOF is a constant that vary in different systems. And it used for getc .

Let's deal first with your original question. Note that man 7 pipe should give some useful information on this.

Say we have the standard input redirected to the input side of a descriptor created by a pipe call, as in:

pipe(p);
// ... fork a child to write to the output side of the pipe ...
dup2(p[0], 0);  // redirect standard input to input side

and we call:

bytes = read(0, buf, 100);

First, note that this behaves no differently than simply reading directly from p[0] , so we could have just done:

pipe(p);
// fork child
bytes = read(p[0], buf, 100);

Then, there are essentially three cases:

  1. If there are bytes in the pipe (ie, at least one byte has been written but not yet read), then the read call will return immediately, and it will return all bytes available up to a maximum of 100 bytes. The return value will be the number of bytes read, and it will always be a positive number between 1 and 100.

  2. If the pipe is empty (no bytes) and the output side has been closed, the buffer won't be touched, and the call will return immediately with return value of 0.

  3. Otherwise, the read call will block until something is written to the pipe or the output side is closed, and then the read call will return immediately using the rules in cases 1 and 2.

So, if a read() call returns 0, that means the end-of-file was reached, and no more bytes are expected. Waiting for additional data happens automatically, and after the wait, you'll either get data (positive return value) or an end-of-file signal (zero return value). In the special case that another process writes some bytes and then immediately closes (the output side of) the pipe, the next read() call will return a positive value up to the specified count . Subsequent read() calls will continue to return positive values as long as there's more data to read. When the data are exhausted, the read() call will return 0 (since the pipe is closed).

On Linux, the above is always true for pipes and any positive count . There can be differences for things other than pipes. Also, if the count is 0, the read() call will always return immediately with return value 0. Note that, if you are trying to write code that runs on platforms other than Linux, you may have to be more careful. An implementation is allowed to return a non-zero number of bytes less than the number requested, even if more bytes are available in the pipe -- this might mean that there's an implementation-defined limit (so you never get more than 4096 bytes, no matter how many you request, for example) or that this implementation-defined limit changes from call to call (so if you request bytes over a page boundary in a kernel buffer, you only get the end of the page or something). On Linux, there's no limit -- the read call will always return everything available up to count , no matter how big count is.

Anyway, the idea is that something like the following code should reliably read all bytes from a pipe until the output side is closed, even on platforms other than Linux:

#define _GNU_SOURCE 1
#include <errno.h>
#include <unistd.h>

/* ... */

    while ((count = TEMP_FAILURE_RETRY(read(fd, buffer, sizeof(buffer)))) > 0) {
        // process "count" bytes in "buffer"
    }
    if (count == -1) {
        // handle error
    }
    // otherwise, end of data reached

If the pipe is never closed ("endless" or "continuous" stream), the while loop will run forever because read will block until it can return a non-zero byte count.

Note that the pipe can also be put into a non-blocking mode which changes the behavior substantially, but the above is the default blocking mode behavior.

With respect to your UPD questions:

Yes, read holds the program execution until data is available, but NO , it doesn't necessarily wait for count bytes. It will wait for a least one non-empty write to the pipe, and that will wake the process; when the process gets a chance to run, it will return whatever's available up to but not necessarily equal to count bytes. Usually, this means that if another process writes 5 bytes, a blocked read(fd, buffer, 100) call will return 5 and execution will continue. Yes, if read returns 0 , it's a signal that there's no more data to be read and the write side of the pipe has been closed (so no more data will ever be available). No , an EOF value does not appear in the buffer. Only bytes read will appear there, and the buffer won't be touched when read() returns 0, so it'll contain whatever was there before the read() call.

With respect to your UPD2 comment:

Yes, on Linux, EOF is a constant equal to the integer -1 . (Technically, according to the C99 standard, it is an integer constant equal to a negative value; maybe someone knows of a platform where it's something other than -1 .) This constant is not used by the read() interface, and it is certainly not written into the buffer. While read() returns -1 in case of error, it would be considered bad practice to compare the return value from read() with EOF instead of -1. As you note, the EOF value is really only used for C library functions like getc() and getchar() to distinguish the end of file from a successfully read character.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM