在 Unix 上使用 C 从/向文件描述符读取/写入 N 个字节

Question

我知道从<unistd.h> read / write C 函数不能保证按size_t nbyte参数的要求读/写正好 N 个字节（特别是对于套接字）。

如何从/向文件（或套接字）描述符读取/写入完整缓冲区？

Answer 1

成功read和write都返回ssize_t包含读取/写入的字节数。 你可以用它来构造一个循环：

可靠的read() ：

ssize_t readall(int fd, void *buff, size_t nbyte) {
    size_t nread = 0; size_t res = 0;
    while (nread < nbyte) {
        res = read(fd, buff+nread, nbyte-nread);
        if (res == 0) break;
        if (res == -1) return -1;
        nread += res;
    }
    return nread;
}

可靠的write() （几乎相同）：

ssize_t writeall(int fd, void *buff, size_t nbyte) {
    size_t nwrote = 0; size_t res = 0;
    while (nwrote < nbyte) {
        res = write(fd, buff+nwrote, nbyte-nwrote);
        if (res == 0) break;
        if (res == -1) return -1;
        nwrote += res;
    }
    return nwrote;
}

基本上它读/写直到总字节数 != nbyte 。

请注意，此答案仅使用<unistd.h>函数，假设有使用它的理由。 如果您也可以使用<stdio.h> ，请参阅使用fdopen John Bollinger 的回答； setvbuf然后fread / fwrite 。 另外，看看Blabbo 的答案是 Verbose for read_range function with read_range features。

Answer 2

read()和write()不能保证传输请求的全部字节数是一个特性，而不是一个缺点。 如果该功能在您的特定应用程序的方式得到那么它可能是更好的使用标准库的现有设施来解决它，而不是推出自己的（虽然我肯定会推出我自己不时）。

具体来说，如果您有一个文件描述符，您希望始终在其上传输确切数量的字节，那么您应该考虑使用fdopen()将其包装在流中，然后使用fread()和fwrite()执行 I/O。 您还可以使用setvbuf()来避免使用中间缓冲区。 作为可能的奖励，您还可以使用其他流函数，例如fgets()和fprintf() 。

例子：

int my_fd = open_some_resource();
// if (my_fd < 0) ...
FILE *my_file = fdopen(my_fd, "r+b");
// if (my_file == NULL) ...
int rval = setvbuf(my_file, NULL, _IONBF, 0);
// if (rval != 0) ...

请注意，此后最好只使用流，而不是底层文件描述符，这是这种方法的主要缺点。 另一方面，您可能允许 FD 丢失，因为关闭流也将关闭底层 FD。

使fread()和fwrite()传输全缓冲单元（或失败）不需要特别特殊：

char buffer[BUF_SIZE];
size_t blocks = fread(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...

// ...

blocks = fwrite(buffer, BUF_SIZE, 1, my_file);
// if (blocks != 1) ...

但是请注意，您必须正确处理第二个和第三个参数的顺序。 第二个是传输单元大小，第三个是传输单元的数量。 除非发生错误或文件结束，否则不会传输部分单元。 将传输单元指定为您要传输的完整字节数并要求（因此）恰好一个单元是实现您所询问的语义的方法。

Answer 3

你使用一个循环。

例如，通过适当的错误检查：

/** Read a specific number of bytes from a file or socket descriptor
 * @param fd        Descriptor
 * @param dst       Buffer to read data into
 * @param minbytes  Minimum number of bytes to read
 * @param maxbytes  Maximum number of bytes to read
 * @return          Exact number of bytes read.
 * errno is always set by this call.
 * It will be set to zero if an acceptable number of bytes was read.
 * If there was 
  and to nonzero otherwise.
 *                  If there was not enough data to read, errno == ENODATA.
*/
size_t  read_range(const int fd, void *const dst, const size_t minbytes, const size_t maxbytes)
{
    if (fd == -1) {
        errno = EBADF;
        return 0;
    } else
    if (!dst || minbytes > maxbytes) {
        errno = EINVAL;
        return 0;
    }

    char       *buf = (char *)dst;
    char *const end = (char *)dst + minbytes;
    char *const lim = (char *)dst + maxbytes;

    while (buf < end) {
        ssize_t n = read(fd, buf, (size_t)(lim - buf));
        if (n > 0) {
            buf += n;
        } else
        if (n == 0) {
            /* Premature end of input */
            errno = ENODATA;  /* Example only; use what you deem best */
            return (size_t)(buf - (char *)dst);
        } else
        if (n != -1) {
            /* C library or kernel bug */
            errno = EIO;
            return (size_t)(buf - (char *)dst);
        } else {
            /* Error, interrupted by signal delivery, or nonblocking I/O would block. */
            return (size_t)(buf - (char *)dst);
        }
    }

    /* At least minbytes, up to maxbytes received. */
    errno = 0;
    return (size_t)(buf - (char *)dst);
}

有些人确实发现它在成功调用时将errno清除为零很奇怪，但它在标准和 POSIX C 中都是完全可以接受的。

在这里，这意味着典型的用例简单而健壮。 例如，

    struct message  msgs[MAX_MSGS];

    size_t  bytes = read_range(fd, msgs, sizeof msgs[0], sizeof msgs);
    if (errno) {
        /* Oops, things did not go as we expected.  Deal with it.
           If bytes > 0, we do have that many bytes in msgs[].
        */
    } else {
        /* We have bytes bytes in msgs.
           bytes >= sizeof msgs[0] and bytes <= sizeof msgs.
        */
    }

如果您有一个模式，其中包含固定或可变大小的消息，并且有一个函数可以一条一条地使用它们，请不要假设最好的选择是尝试一次只读取一条消息，因为事实并非如此。

这也是为什么上面的示例使用minbytes和maxbytes而不是单个exactly_this_many_bytes参数的原因。

更好的模式是拥有更大的缓冲区，仅在必须时才在其中 memmove() 数据（因为空间不足，或者因为下一条消息未充分对齐）。

例如，假设您有一个流套接字或文件描述符，其中每个传入的消息由一个三字节的头组成：第一个字节标识消息类型，接下来的两个字节（例如，低位字节在前）标识消息的数量与消息关联的数据负载字节。 这意味着消息的最大总长度为 1+2+65535 = 65538 字节。

为了有效地接收消息，您将使用动态分配的缓冲区。 缓冲区大小是一个软件工程问题，除此之外它必须至少为65538 字节，它的大小——甚至它是否应该动态增长和收缩——取决于具体情况。 所以，我们假设我们有unsigned char *data; 指向大小为size_t size;的缓冲区size_t size; 已经分配。

循环本身可能如下所示：

    size_t  head = 0;  /* Offset to current message */
    size_t  tail = 0;  /* Offset to first unused byte in buffer */
    size_t  mlen = 0;  /* Total length of the current message; 0 is "unknown"*/

    while (1) {

        /* Message processing loop. */
        while (head + 3 <= tail) {

            /* Verify we know the total length of the message
               that starts at offset head. */
            if (!mlen)
                mlen = 3 + (size_t)(data[head + 1])
                         + (size_t)(data[head + 2]) << 8;

            /* If that message is not yet complete, we cannot process it. */
            if (head + mlen > tail)
                break;

            /*             type        datalen,  pointer to data */
            handle_message(data[head], mlen - 3, data + head + 3);

            /* Skip message in buffer. */
            head += mlen;

            /* Since we do not know the length of the next message,
               or rather, the current message starting at head,
               we do need to reset mlen to "unknown", 0. */
            mlen  = 0;
        }

        /* At this point, the buffer contains less than one full message.
           Whether it is better to always move a partial leftover message
           to the beginning of the buffer, or only do so if the buffer
           is full, depends on the workload and buffer size.
           The following one may look complex, but it is actually simple.
           If the current start of the buffer is past the halfway mark,
           or there is no more room at the end of the buffer, we do the move.
           Only if the current message starts in the initial half, and
           when there is room at the end of the buffer, we leave it be.
           But first: If we have no data in the buffer, it is always best
           to start filling it from the beginning.
        */
        if (head >= tail) {
            head = 0;
            tail = 0;
        } else
        if (head >= size/2 || tail >= size) {
            memmove(data, data + head, tail - head);
            tail -= head;
            head = 0;
        }

        /* We do not have a complete message, but there
           is room in the buffer (assuming size >= 65538),
           we need to now read more data into the buffer. */
        ssize_t  n = read(sourcefd, data + tail, size - tail);
        if (n > 0) {
            tail += n;

            /* Check if it completed one or more messages. */
            continue;

        } else
        if (n == 0) {
            /* End of input.  If buffer is empty, that's okay. */
            if (head >= tail)
                break;

            /* Ouch: We have partial message in the buffer,
                     but there will be no more incoming data! */
            ISSUE_WARNING("Discarding %zu byte partial message due to end of input.\n", tail - head);
            break;

        } else
        if (n != -1) {
            /* This should not happen.  If it does, it is a C library
               or kernel bug.  We treat it as fatal. */
            ISSUE_ERROR("read() returned %zd; dropping connection.\n", n);
            break;

        } else
        if (errno != EINTR) {
            /* Everything except EINTR indicates an error to us; we do
               assume that sourcefd is blocking (not nonblocking). */
            ISSUE_ERROR("read() failed with errno %d (%s); dropping connection.\n", errno, strerror(errno));
            break;
        }

        /* The case n == -1, errno == EINTR usually occurs when a signal
           was delivered to a handler using this thread, and that handler
           was installed without SA_RESTART.  Depending on what kind of
           a device or socket sourcefd is, there could be additional cases;
           but in general, it just means "something unrelated happened,
           but you were to be notified about it, so EINTR you get".
           Simply put, EINTR is not really an error, just like
           EWOULDBLOCK/EAGAIN is not an error for nonblocking descriptors,
           they're just easiest to treat as an "error-like situation" in C.
        */
    }

    /* close(sourcefd); */

请注意循环实际上并没有尝试读取任何特定数量的数据？ 它只是尽可能多地读取，并进行处理。

是否可以通过首先准确读取三字节标头，然后准确读取数据有效负载来精确读取此类消息？ 当然，但这意味着你要进行大量的系统调用； 每条消息至少两个。 如果消息很常见，您可能不想因为系统调用开销而这样做。

是否可以更谨慎地使用可用缓冲区，并尽快从缓冲区中的下一条消息中删除类型和数据有效负载长度？ 嗯，这是一个应该与以前编写过此类代码的同事或开发人员讨论的问题。 有积极的（主要是，您节省了三个字节）和消极的（增加了代码复杂性，这总是使代码更难以长期维护，并且有引入错误的风险）。 在只有 128 字节缓冲区用于传入命令消息的微控制器上，我可能会这样做； 但在台式机或服务器上，这种代码更喜欢几百千字节而不是几兆字节的缓冲区（因为内存“浪费”通常被较少数量的系统调用所覆盖，尤其是在处理大量消息时）。 没有快速答案！ :)-

在 Unix 上使用 C 从/向文件描述符读取/写入 N 个字节

问题描述

3 个解决方案

解决方案1
1 2021-06-25 13:59:52

解决方案2
1 已采纳 2021-06-25 14:46:57

解决方案3
1 2021-06-25 15:17:51

在 Unix 上使用 C 从/向文件描述符读取/写入 N 个字节

问题描述

3 个解决方案

解决方案1 1 2021-06-25 13:59:52

解决方案2 1 已采纳 2021-06-25 14:46:57

解决方案3 1 2021-06-25 15:17:51

解决方案1
1 2021-06-25 13:59:52

解决方案2
1 已采纳 2021-06-25 14:46:57

解决方案3
1 2021-06-25 15:17:51