從 C/C++ 中的 TCP 套接字讀取的正確方法是什么？

Question

這是我的代碼：

// Not all headers are relevant to the code snippet.
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <netdb.h>
#include <cstdlib>
#include <cstring>
#include <unistd.h>

char *buffer;
stringstream readStream;
bool readData = true;

while (readData)
{
    cout << "Receiving chunk... ";

    // Read a bit at a time, eventually "end" string will be received.
    bzero(buffer, BUFFER_SIZE);
    int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);
    if (readResult < 0)
    {
        THROW_VIMRID_EX("Could not read from socket.");
    }

    // Concatenate the received data to the existing data.
    readStream << buffer;

    // Continue reading while end is not found.
    readData = readStream.str().find("end;") == string::npos;

    cout << "Done (length: " << readStream.str().length() << ")" << endl;
}

如您所知，它有點像 C 和 C++。 BUFFER_SIZE 是 256 - 我應該增加大小嗎？ 如果是這樣，該怎么辦？ 有關系嗎？

我知道如果由於某種原因沒有收到“結束”，這將是一個無限循環，這很糟糕 - 所以如果你能提出更好的方法，也請這樣做。

Answer 1

在不了解您的完整應用程序的情況下，很難說解決問題的最佳方法是什么，但一種常見的技術是使用以固定長度字段開頭的 header，該字段表示您的消息的 rest 的長度。

假設您的 header 僅包含一個 4 字節的 integer ，它表示您的消息的 rest 的長度。 然后只需執行以下操作。

// This assumes buffer is at least x bytes long,
// and that the socket is blocking.
void ReadXBytes(int socket, unsigned int x, void* buffer)
{
    int bytesRead = 0;
    int result;
    while (bytesRead < x)
    {
        result = read(socket, buffer + bytesRead, x - bytesRead);
        if (result < 1 )
        {
            // Throw your error.
        }

        bytesRead += result;
    }
}

然后在后面的代碼中

unsigned int length = 0;
char* buffer = 0;
// we assume that sizeof(length) will return 4 here.
ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// Then process the data as needed.

delete [] buffer;

這做了一些假設：

整數在發送方和接收方上的大小相同。
發送方和接收方的字節序相同。
您可以控制雙方的協議
當您發送消息時，您可以預先計算長度。

由於通常希望明確知道您通過網絡發送的 integer 的大小，因此在 header 文件中定義它們並明確使用它們，例如：

// These typedefs will vary across different platforms
// such as linux, win32, OS/X etc, but the idea
// is that a Int8 is always 8 bits, and a UInt32 is always
// 32 bits regardless of the platform you are on.
// These vary from compiler to compiler, so you have to 
// look them up in the compiler documentation.
typedef char Int8;
typedef short int Int16;
typedef int Int32;

typedef unsigned char UInt8;
typedef unsigned short int UInt16;
typedef unsigned int UInt32;

這會將上述內容更改為：

UInt32 length = 0;
char* buffer = 0;

ReadXBytes(socketFileDescriptor, sizeof(length), (void*)(&length));
buffer = new char[length];
ReadXBytes(socketFileDescriptor, length, (void*)buffer);

// process

delete [] buffer;

我希望這有幫助。

Answer 2

幾個指針：

您需要處理返回值 0，它告訴您遠程主機關閉了套接字。

對於非阻塞 sockets，您還需要檢查錯誤返回值 (-1) 並確保 errno 不是 EINPROGRESS，這是預期的。

您肯定需要更好的錯誤處理 - 您可能會泄漏“緩沖區”指向的緩沖區。 我注意到，您沒有在此代碼段中分配任何位置。

如果您的 read() 填充了整個緩沖區，那么其他人就您的緩沖區如何不是 null 終止的 C 字符串提出了一個很好的觀點。 這確實是一個問題，而且是一個嚴重的問題。

您的緩沖區大小有點小，但只要您不嘗試讀取超過 256 個字節或您為其分配的任何內容，它就應該可以工作。

如果您擔心在遠程主機向您發送格式錯誤的消息（潛在的拒絕服務攻擊）時進入無限循環，那么您應該使用 select() 並在套接字上超時以檢查可讀性，並且僅在數據可用，如果 select() 超時則退出。

這樣的事情可能對你有用：

fd_set read_set;
struct timeval timeout;

timeout.tv_sec = 60; // Time out after a minute
timeout.tv_usec = 0;

FD_ZERO(&read_set);
FD_SET(socketFileDescriptor, &read_set);

int r=select(socketFileDescriptor+1, &read_set, NULL, NULL, &timeout);

if( r<0 ) {
    // Handle the error
}

if( r==0 ) {
    // Timeout - handle that. You could try waiting again, close the socket...
}

if( r>0 ) {
    // The socket is ready for reading - call read() on it.
}

根據您期望接收的數據量，您重復掃描整條消息以尋找“結尾”的方式； 令牌非常低效。 最好使用 state 機器（狀態為 'e'->'n'->'d'->';'）來完成，這樣您只需查看每個傳入字符一次。

說真的，你應該考慮找一個圖書館來為你做這一切。 做到正確並不容易。

Answer 3

如果您實際按照 dirks 的建議創建緩沖區，則：

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE);

可能會完全填滿緩沖區，可能會覆蓋您在提取到字符串流時所依賴的終止零字符。 你需要：

  int readResult = read(socketFileDescriptor, buffer, BUFFER_SIZE - 1 );

Answer 4

1）其他人（尤其是急切地）注意到緩沖區需要分配一些 memory 空間。 對於較小的 N 值（例如，N <= 4096），您也可以在堆棧上分配它：

#define BUFFER_SIZE 4096
char buffer[BUFFER_SIZE]

這使您不必擔心在拋出異常時確保delete[]緩沖區。

但請記住，堆棧的大小是有限的（堆也是有限的，但堆棧是有限的），所以你不想在那里放太多。

2）在 -1 返回代碼上，您不應該簡單地立即返回（立即拋出異常更加粗略。）如果您的代碼不僅僅是簡短的家庭作業，您需要處理某些正常情況. 例如，如果非阻塞套接字上當前沒有可用數據，則 EAGAIN 可能會在 errno 中返回。 查看 read(2) 的手冊頁。

Answer 5

你在哪里為你的buffer分配 memory ？ 調用bzero的行會調用未定義的行為，因為 buffer 沒有指向 memory 的任何有效區域。

char *buffer = new char[ BUFFER_SIZE ];
// do processing

// don't forget to release
delete[] buffer;

Answer 6

這是我在使用 sockets 時經常參考的一篇文章。

SELECT() 的世界

它將向您展示如何可靠地使用“select()”，並在底部包含一些其他有用的鏈接，以獲取有關 sockets 的更多信息。

Answer 7

只是從上面的幾個帖子中添加內容：

read() - 至少在我的系統上 - 返回 ssize_t。 這類似於 size_t，但已簽名。 在我的系統上，它是一個長整數，而不是整數。 如果您使用 int，您可能會收到編譯器警告，具體取決於您的系統、編譯器以及您打開了哪些警告。

Answer 8

對於任何重要的應用程序（IE 應用程序必須接收和處理不同長度的不同類型的消息），您特定問題的解決方案不一定只是編程解決方案——它是一種約定，IE 是一種協議。

為了確定應該將多少字節傳遞給read調用，您應該建立一個公共前綴，即 header，您的應用程序將接收該前綴。 這樣，當套接字第一次讀取可用時，您可以決定預期的內容。

二進制示例可能如下所示：

#include <stdint.h>
#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <arpa/inet.h>

enum MessageType {
    MESSAGE_FOO,
    MESSAGE_BAR,
};

struct MessageHeader {
    uint32_t type;
    uint32_t length;
};

/**
 * Attempts to continue reading a `socket` until `bytes` number
 * of bytes are read. Returns truthy on success, falsy on failure.
 *
 * Similar to @grieve's ReadXBytes.
 */
int readExpected(int socket, void *destination, size_t bytes)
{
    /*
    * Can't increment a void pointer, as incrementing
    * is done by the width of the pointed-to type -
    * and void doesn't have a width
    *
    * You can in GCC but it's not very portable
    */
    char *destinationBytes = destination;
    while (bytes) {
        ssize_t readBytes = read(socket, destinationBytes, bytes);
        if (readBytes < 1)
            return 0;
        destinationBytes += readBytes;
        bytes -= readBytes;
    }
    return 1;
}

int main(int argc, char **argv)
{
    int selectedFd;

    // use `select` or `poll` to wait on sockets
    // received a message on `selectedFd`, start reading

    char *fooMessage;
    struct {
        uint32_t a;
        uint32_t b;
    } barMessage;

    struct MessageHeader received;
    if (!readExpected (selectedFd, &received, sizeof(received))) {
        // handle error
    }
    // handle network/host byte order differences maybe
    received.type = ntohl(received.type);
    received.length = ntohl(received.length);

    switch (received.type) {
        case MESSAGE_FOO:
            // "foo" sends an ASCII string or something
            fooMessage = calloc(received.length + 1, 1);
            if (readExpected (selectedFd, fooMessage, received.length))
                puts(fooMessage);
            free(fooMessage);
            break;
        case MESSAGE_BAR:
            // "bar" sends a message of a fixed size
            if (readExpected (selectedFd, &barMessage, sizeof(barMessage))) {
                barMessage.a = ntohl(barMessage.a);
                barMessage.b = ntohl(barMessage.b);
                printf("a + b = %d\n", barMessage.a + barMessage.b);
            }
            break;
        default:
            puts("Malformed type received");
            // kick the client out probably
    }
}

您可能已經看到使用二進制格式的一個缺點 - 對於每個大於您讀取的char的屬性，您必須使用ntohl或ntohs函數確保其字節順序正確。

另一種方法是使用字節編碼的消息，例如簡單的 ASCII 或 UTF-8 字符串，這完全避免了字節順序問題，但需要額外的努力來解析和驗證。

C 中的網絡數據有兩個最終考慮因素。

首先是一些 C 類型沒有固定寬度。 例如，不起眼的int被定義為處理器的字長，所以 32 位處理器會產生 32 位的int ，而 64 位的處理器會產生 64 位的int 。 好的、可移植的代碼應該讓網絡數據使用固定寬度的類型，就像在stdint.h中定義的那樣。

第二個是結構填充。 具有不同寬度成員的結構將在某些成員之間添加數據以維護 memory alignment，從而使該結構在程序中使用起來更快，但有時會產生令人困惑的結果。

#include <stdio.h>
#include <stdint.h>

int main()
{
    struct A {
        char a;
        uint32_t b;
    } A;

    printf("sizeof(A): %ld\n", sizeof(A));
}

在這個例子中，它的實際寬度不是 1 char + 4 uint32_t = 5 bytes，而是 8：

mharrison@mharrison-KATANA:~$ gcc -o padding padding.c
mharrison@mharrison-KATANA:~$ ./padding 
sizeof(A): 8

這是因為在char a之后添加了 3 個字節，以確保uint32_t b是內存對齊的。

因此，如果您write struct A ，然后嘗試在另一側讀取char和uint32_t ，您將得到char a和 uint32_t ，其中前三個字節是垃圾，最后一個字節是實際的第一個字節你寫的 integer。

將您的數據格式明確記錄為 C 結構類型，或者更好的是，記錄它們可能包含的任何填充字節。

從 C/C++ 中的 TCP 套接字讀取的正確方法是什么？

問題描述

8 個解決方案

解決方案1
48 已采納 2009-03-20 16:01:26

解決方案2
10 2009-03-20 15:29:02

解決方案3
4

解決方案4
3 2009-03-21 03:38:51

解決方案5
2 2009-03-20 15:27:54

解決方案6
1 2009-03-20 16:17:19

解決方案7
0 2019-09-11 16:23:23

解決方案8
0 2020-03-02 12:55:54

從 C/C++ 中的 TCP 套接字讀取的正確方法是什么？

問題描述

8 個解決方案

解決方案1 48 已采納 2009-03-20 16:01:26

解決方案2 10 2009-03-20 15:29:02

解決方案3 4

解決方案4 3 2009-03-21 03:38:51

解決方案5 2 2009-03-20 15:27:54

解決方案6 1 2009-03-20 16:17:19

解決方案7 0 2019-09-11 16:23:23

解決方案8 0 2020-03-02 12:55:54

解決方案1
48 已采納 2009-03-20 16:01:26

解決方案2
10 2009-03-20 15:29:02

解決方案3
4

解決方案4
3 2009-03-21 03:38:51

解決方案5
2 2009-03-20 15:27:54

解決方案6
1 2009-03-20 16:17:19

解決方案7
0 2019-09-11 16:23:23

解決方案8
0 2020-03-02 12:55:54