简体   繁体   English

C ++ TCP套接字通信-连接按预期工作,几秒钟后失败,未接收到新数据并且read()和recv()块

[英]C++ TCP Socket communication - Connection is working as expected, fails after a couple of seconds, no new data is received and read() and recv() block

I am using 64-bit Ubuntu 16.04 LTS. 我正在使用64位Ubuntu 16.04 LTS。 Like I said, I am attempting to make a TCP socket connection to another device. 就像我说的,我正在尝试建立与另一台设备的TCP套接字连接。 The program starts by reading data from the socket to initialize the last_recorded_data variable (as seen below, towards the bottom of myStartProcedure() ), and I know that this is working exactly as expected. 该程序首先从套接字读取数据以初始化last_recorded_data变量(如下所示,朝向myStartProcedure()的底部),并且我知道它的工作与预期的完全一样。 Then, the rest of the program starts which is driven by callbacks. 然后,程序的其余部分将启动,由回调驱动。 When I make UPDATE_BUFFER_MS something smaller like 8, it fails after a couple of seconds. 当我使UPDATE_BUFFER_MS小于8时,它在几秒钟后失败。 A frequency of this value is the desired value, but if I make it larger for testing purposes (something like 500), then it works for a little bit longer, but also eventually fails the same way. 该值的频率是期望值,但是如果出于测试目的将其增大(大约为500),则它的工作时间会更长一些,但最终也会以相同的方式失败。

The failure is as follows: The device I'm attempting to read from consistently sends data every 8 milliseconds, and within this packet of data, the first few bytes are reserved for telling the client how large the packet is, in bytes. 失败如下:我试图从中读取的设备会每8毫秒发送一次数据,并且在此数据包中,保留了前几个字节,以告知客户端该数据包的大小(以字节为单位)。 During normal operation, the received number of bytes and the size as described by these first few bytes are equal. 在正常操作期间,接收到的字节数和前几个字节所描述的大小是相等的。 However, the packet received directly before the read() call starts to block is always 24 bytes less than the expected size, but the packet says the data packet sent should still be the expected size. 但是,在read()调用开始阻塞之前直接接收到的数据包始终比预期大小小24个字节,但是该数据包说发送的数据包仍应为预期大小。 When the next attempt to get the data is made, the read() call blocks and upon timeout sets errno to be EAGAIN (Resource temporarily unavailable) . 进行下一次尝试获取数据时, read()调用将阻塞,并在超时时将errnoEAGAIN (Resource temporarily unavailable)

I tried communicating with this same device with a Python application and it is not experiencing the same issue. 我尝试通过Python应用程序与此设备进行通讯,但没有遇到相同的问题。 Furthermore, I tried this C++ application on another one of these devices and I'm seeing the same behavior, so I think it's a problem on my end. 此外,我在其中另一台设备上尝试了该C ++应用程序,并且看到了相同的行为,因此我认为这是我的问题。 My code (simplified) is below. 我的代码(简体)如下。 Please let me know if you see any obvious errors, thank you!! 如果您发现任何明显的错误,请告诉我,谢谢!!

#include <string>
#include <unistd.h>
#include <iostream>

#include <stdio.h>
#include <errno.h>
#include <sys/socket.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <arpa/inet.h>

#define COMM_DOMAIN AF_INET
#define PORT        8008

#define TIMEOUT_SECS  3
#define TIMEOUT_USECS 0

#define UPDATE_BUFFER_MS 8

#define PACKET_SIZE_BYTES_MAX 1200

//
// Global variables
//

// Socket file descriptor
int socket_conn;

// Tracks the timestamp of the last time data was recorded
// The data packet from the TCP connection is sent every UPDATE_BUFFER_MS milliseconds
unsigned long last_process_cycle_timestamp;

// The most recently heard data, cast to a double
double last_recorded_data;

// The number of bytes expected from a full packet
int full_packet_size;

// The minimum number of bytes needed from the packet, as I don't need all of the data
int min_required_packet_size;

// Helper to cast the packet data to a double
union PacketAsFloat
{
    unsigned char byte_values[8];
    double decimal_value;
};

// Simple struct to package the data read from the socket
struct SimpleDataStruct
{
    // Whether or not the struct was properly populated
    bool valid;

    // Some data that we're interested in right now
    double important_data;

    //
    // Other, irrelevant members removed for simplicity
    //
};

// Procedure to read the next data packet
SimpleDataStruct readCurrentData()
{
    SimpleDataStruct data;
    data.valid = false;

    unsigned char socket_data_buffer[PACKET_SIZE_BYTES_MAX] = {0};

    int read_status = read(socket_conn, socket_data_buffer, PACKET_SIZE_BYTES_MAX);
    if (read_status < min_required_packet_size)
    {
        return data;
    }

    //for (int i = 0; i < read_status - 1; i++)
    //{
    //  std::cout << static_cast<int>(socket_data_buffer[i]) << ", ";
    //}
    //std::cout << static_cast<int>(socket_data_buffer[read_status - 1]) << std::endl;

    PacketAsFloat packet_union;
    for (int j = 0; j < 8; j++)
    {
        packet_union.byte_values[7 - j] = socket_data_buffer[j + 252];
    }

    data.important_data = packet_union.decimal_value;
    data.valid          = true;

    return data;
}

// This acts as the main entry point
void myStartProcedure(std::string host)
{
    //
    // Code to determine the value for full_packet_size and min_required_packet_size (because it can vary) was removed
    // Simplified version is below
    //

    full_packet_size         = some_known_value;
    min_required_packet_size = some_other_known_value;

    //
    // Create socket connection
    //

    if ((socket_conn = socket(COMM_DOMAIN, SOCK_STREAM, 0)) < 0)
    {
        std::cout << "socket_conn heard a bad value..." << std::endl;
        return;
    }

    struct sockaddr_in socket_server_address;
    memset(&socket_server_address, '0', sizeof(socket_server_address));

    socket_server_address.sin_family = COMM_DOMAIN;
    socket_server_address.sin_port   = htons(PORT);

    // Create and set timeout
    struct timeval timeout_chars;
    timeout_chars.tv_sec  = TIMEOUT_SECS;
    timeout_chars.tv_usec = TIMEOUT_USECS;

    setsockopt(socket_conn, SOL_SOCKET, SO_RCVTIMEO, (const char*)&timeout_chars, sizeof(timeout_chars));

    if (inet_pton(COMM_DOMAIN, host.c_str(), &socket_server_address.sin_addr) <= 0)
    {
        std::cout << "Invalid address heard..." << std::endl;
        return;
    }

    if (connect(socket_conn, (struct sockaddr *)&socket_server_address, sizeof(socket_server_address)) < 0)
    {
        std::cout << "Failed to make connection to " << host << ":" << PORT << std::endl;
        return;
    }
    else
    {
        std::cout << "Successfully brought up socket connection..." << std::endl;
    }

    // Sleep for half a second to let the networking setup properly
    sleepMilli(500); // A sleep function I defined elsewhere

    SimpleDataStruct initial = readCurrentData();
    if (initial.valid)
    {
        last_recorded_data = initial.important_data;
    }
    else
    {
        // Error handling
        return -1;
    }

    //
    // Start the rest of the program, which is driven by callbacks
    //
}

void updateRequestCallback()
{
    unsigned long now_ns = currentTime(); // A function I defined elsewhere that gets the current system time in nanoseconds

    if (now_ns - last_process_cycle_timestamp >= 1000000 * UPDATE_BUFFER_MS)
    {
        SimpleDataStruct current_data = readCurrentData();

        if (current_data.valid)
        {
            last_recorded_data = current_data.important_data;
            last_process_cycle_timestamp = now_ns;
        }
        else
        {
            // Error handling
             std::cout << "ERROR setting updated data, SimpleDataStruct was invalid." << std:endl;
             return;
        }
    }
}

EDIT #1 编辑#1

I should be receiving a certain number of bytes every time, and I would expect the return value of read() to be returning that value as well. 我应该每次都接收一定数量的字节,并且我希望read()的返回值也将返回该值。 However, I just tried changing the value of PACKET_SIZE_BYTES_MAX to be 2048, and the return value of read() is now 2048, when it should be the size of the packet that the device is sending back (NOT 2048). 但是,我只是尝试将PACKET_SIZE_BYTES_MAX的值更改为2048,而read()的返回值现在是2048,此时它应该是设备发回的数据包的大小(不是2048)。 The Python application is also setting the max to be 2048 and its returning packet size is the correct/expected size... Python应用程序还将最大值设置为2048,并且其返回数据包大小为正确/预期的大小...

Try commenting out the timeout setup. 尝试注释掉超时设置。 I never use that on my end and I don't experience the problem you're talking about. 我从没有用过它,也没有遇到您正在谈论的问题。

// Create and set timeout
struct timeval timeout_chars;
timeout_chars.tv_sec  = TIMEOUT_SECS;
timeout_chars.tv_usec = TIMEOUT_USECS;

setsockopt(socket_conn, SOL_SOCKET, SO_RCVTIMEO, (const char*)&timeout_chars, sizeof(timeout_chars));

To avoid blocking, you can setup the socket as a non-block socket and then use a select() or poll() to get more data. 为了避免阻塞,可以将套接字设置为非阻塞套接字,然后使用select()poll()获取更多数据。 Both of these functions can use the timeout as presented above. 这两个功能都可以使用上述超时。 However, with a non-blocking socket you must make sure that the read works as expected. 但是,对于非阻塞套接字,必须确保读取按预期进行。 In many cases you will get a partial read and have to wait ( select() or poll() ) again for more data. 在许多情况下,您将获得部分读取的内容,并且必须再次等待( select()poll() )以获取更多数据。 So the code would be a bit more complicated. 因此,代码将更加复杂。

socket_conn = socket(COMM_DOMAIN, SOCK_STREAM | SOCK_NONBLOCK, 0);

If security is a potential issue, I would also set SOCK_CLOEXEC to prevent a child process from accessing the same socket. 如果安全性是一个潜在的问题,我还将设置SOCK_CLOEXEC以防止子进程访问相同的套接字。

std::vector<struct pollfd> fds;

struct pollfd fd;
fd.fd = socket_conn;
fd.events = POLLIN | POLLPRI | POLLRDHUP; // also POLLOUT for writing
fd.revents = 0; // probably useless... (kernel should clear those)
fds.push_back(fd);

int64_t timeout_chars = TIMEOUT_SECS * 1000 + TIMEOUT_USECS / 1000;

int const r = poll(&fds[0], fds.size(), timeout_chars);
if(r < 0) { ...handle error(s)... }

Another method, assuming the header size is well defined and never changes, is to read the header, then using the header information to read the rest of the data. 假设标头大小定义明确且永不改变,另一种方法是读取标头,然后使用标头信息读取其余数据。 In that case you can keep the blocking socket without any timeout. 在这种情况下,您可以保持阻塞套接字而不会超时。 From your structures I have no idea what that could be. 从你的结构,我不知道那会是什么。 So... let's first define such a structure: 所以...让我们首先定义这样的结构:

struct header
{
    char sync[4];  // four bytes indicated a synchronization point
    uint32_t size; // size of packet
    ...            // some other info
};

I put a "sync" field. 我输入了“同步”字段。 In TCP it is often that people will add such a field so if you lose track of your position you can seek to the next sync by reading one byte at a time. 在TCP中,人们通常会添加一个这样的字段,因此,如果您不了解自己的位置,可以通过一次读取一个字节来寻求下一个同步。 Frankly, with TCP, you should never get a transmission error like that. 坦白说,使用TCP,您永远都不会遇到这样的传输错误。 You may lose the connection, but never lose data from the stream (ie TCP is like a perfect FIFO over your network.) That being said, if you are working on a mission critical software, a sync and also a checksum would be very welcome. 您可能会丢失连接,但永远不会丢失流中的数据(即TCP就像网络上的完美FIFO。)也就是说,如果您正在处理任务关键型软件,那么非常欢迎同步和校验和。

Next we read() just the header. 接下来,我们只read()标头。 Now we know of the exact size of this packet, so we can use that specific size and read exactly that many bytes in our packet buffer: 现在我们知道了此数据包的确切大小,因此我们可以使用该特定大小,并在我们的数据包缓冲区中准确读取那么多字节:

struct header hdr;
read(socket_conn, &hdr, sizeof(hdr));
read(socket_conn, packet, hdr.size /* - sizeof(hdr) */);

Obviously, read() may return an error and the size in the header may be defined in big endian (so you need to swap the bytes on x86 processors). 显然, read()可能返回错误,并且标头中的大小可能以big endian定义(因此您需要在x86处理器上交换字节)。 But that should get you going. 但这应该可以帮助您前进。

Also, if the size found in the header includes the number of bytes in the header, make sure to subtract that amount when reading the rest of the packet. 另外,如果在标头中找到的大小包括标头中的字节数,请确保在读取包的其余部分时减去该数量。


Also, the following is wrong: 另外,以下是错误的:

memset(&socket_server_address, '0', sizeof(socket_server_address));

You meant to clear the structure with zeroes, not character zero. 您打算用零而不是字符零来清除结构。 Although if it connects that means it probably doesn't matter much. 尽管如果连接就意味着没关系。 Just use 0 instead of '0' . 只需使用0而不是'0'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM