Today at 5am I read an article about read
system call. And things become significantly clear for me.
ssize_t read(int fd, void *buf, size_t count);
The construction of *nix
like operation system become amazing in it's simplicity. File interface for any entity, just ask to write some date from this fd
interface into some memory by *buf
pointer. All the same for network, files, streams.
But some question appears. How to distinguish two cases?: 1) Stream is empty need to wait for new data. 2) Stream is closed need to close program.
Here is a scenario:
STDIN
in loop, this STDIN
redirected by pipe
. text_data
appears EOF
in memory, or 0
as result of read
call? This is unclear. In case of endless
or continuous
streams.
UPD After speak with @Bailey Kocin and reading some docs I have this understanding. Fix me if I'm wrong.
read
holds the program execution and waits for count
amount of bites. count
amount of bites appears read
writes it into buf
and execution continues. stream
is closed read
returns 0
, and it is a signal that program may be finished. Question Do EOF
appears in buf
?
UPD2 EOF
is a constant that can be in the output of getc
function
while (ch != EOF) {
/* display contents of file on screen */
putchar(ch);
ch = getc(fp);
}
But in case of read
the EOF
value dose not appears in a buf
. read
system call signalize about file ending by returning 0
. Instead of writing EOF
constant into the data-area
, as like ak in case of getc
.
EOF
is a constant that vary in different systems. And it used for getc
.
Let's deal first with your original question. Note that man 7 pipe
should give some useful information on this.
Say we have the standard input redirected to the input side of a descriptor created by a pipe
call, as in:
pipe(p);
// ... fork a child to write to the output side of the pipe ...
dup2(p[0], 0); // redirect standard input to input side
and we call:
bytes = read(0, buf, 100);
First, note that this behaves no differently than simply reading directly from p[0]
, so we could have just done:
pipe(p);
// fork child
bytes = read(p[0], buf, 100);
Then, there are essentially three cases:
If there are bytes in the pipe (ie, at least one byte has been written but not yet read), then the read call will return immediately, and it will return all bytes available up to a maximum of 100 bytes. The return value will be the number of bytes read, and it will always be a positive number between 1 and 100.
If the pipe is empty (no bytes) and the output side has been closed, the buffer won't be touched, and the call will return immediately with return value of 0.
Otherwise, the read call will block until something is written to the pipe or the output side is closed, and then the read call will return immediately using the rules in cases 1 and 2.
So, if a read()
call returns 0, that means the end-of-file was reached, and no more bytes are expected. Waiting for additional data happens automatically, and after the wait, you'll either get data (positive return value) or an end-of-file signal (zero return value). In the special case that another process writes some bytes and then immediately closes (the output side of) the pipe, the next read()
call will return a positive value up to the specified count
. Subsequent read()
calls will continue to return positive values as long as there's more data to read. When the data are exhausted, the read()
call will return 0 (since the pipe is closed).
On Linux, the above is always true for pipes and any positive count
. There can be differences for things other than pipes. Also, if the count
is 0, the read()
call will always return immediately with return value 0. Note that, if you are trying to write code that runs on platforms other than Linux, you may have to be more careful. An implementation is allowed to return a non-zero number of bytes less than the number requested, even if more bytes are available in the pipe -- this might mean that there's an implementation-defined limit (so you never get more than 4096 bytes, no matter how many you request, for example) or that this implementation-defined limit changes from call to call (so if you request bytes over a page boundary in a kernel buffer, you only get the end of the page or something). On Linux, there's no limit -- the read call will always return everything available up to count
, no matter how big count
is.
Anyway, the idea is that something like the following code should reliably read all bytes from a pipe until the output side is closed, even on platforms other than Linux:
#define _GNU_SOURCE 1
#include <errno.h>
#include <unistd.h>
/* ... */
while ((count = TEMP_FAILURE_RETRY(read(fd, buffer, sizeof(buffer)))) > 0) {
// process "count" bytes in "buffer"
}
if (count == -1) {
// handle error
}
// otherwise, end of data reached
If the pipe is never closed ("endless" or "continuous" stream), the while
loop will run forever because read
will block until it can return a non-zero byte count.
Note that the pipe can also be put into a non-blocking mode which changes the behavior substantially, but the above is the default blocking mode behavior.
With respect to your UPD questions:
Yes, read
holds the program execution until data is available, but NO , it doesn't necessarily wait for count
bytes. It will wait for a least one non-empty write
to the pipe, and that will wake the process; when the process gets a chance to run, it will return whatever's available up to but not necessarily equal to count
bytes. Usually, this means that if another process writes 5 bytes, a blocked read(fd, buffer, 100)
call will return 5 and execution will continue. Yes, if read
returns 0
, it's a signal that there's no more data to be read and the write side of the pipe has been closed (so no more data will ever be available). No , an EOF
value does not appear in the buffer. Only bytes read will appear there, and the buffer won't be touched when read()
returns 0, so it'll contain whatever was there before the read()
call.
With respect to your UPD2 comment:
Yes, on Linux, EOF is a constant equal to the integer -1
. (Technically, according to the C99 standard, it is an integer constant equal to a negative value; maybe someone knows of a platform where it's something other than -1
.) This constant is not used by the read()
interface, and it is certainly not written into the buffer. While read()
returns -1 in case of error, it would be considered bad practice to compare the return value from read()
with EOF instead of -1. As you note, the EOF value is really only used for C library functions like getc()
and getchar()
to distinguish the end of file from a successfully read character.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.