简体   繁体   English

32739个连接后套接字无法关闭

[英]sockets go to not closing after 32739 connections

UPDATE : After investigating lil more I found the real problem for this behavior . 更新:经过更多调查后,我发现了此行为的真正问题。 Problem is, I am creating the threads for each connection and passing the sock fd to the thread but was not pthraed_joining immediately so that made my main thread not to able to create any more threads after the connection acceptance. 问题是,我正在为每个连接创建线程,并将袜子fd传递给该线程,但没有立即进行pthraed_joining,因此,我的主线程无法在接受连接后创建更多线程。 and my logic of closing the socket is in child thread, coz of that i was not able to close the socket and hence they were going to WAIT CLOSE state. 我关闭套接字的逻辑是在子线程中,原因是我无法关闭套接字,因此它们将进入等待关闭状态。 SO I just detached the threads after creating them and all works well as of now !! 所以我只是在创建线程后分离了线程,并且到目前为止一切正常!

I have a client server program, I am using a script to run the client and make as many as connections possible and close them after sending a line of data and exit the client, every thing works fine until 32739 th connection ie connection is closed on both the sides and all but after that number the connection is not getting closed and server stops taking any more connections and if do 我有一个客户端服务器程序,我正在使用脚本来运行客户端并建立尽可能多的连接,并在发送一行数据并退出客户端后关闭它们,一切正常,直到第32739个连接即关闭连接双方以及除此编号外的所有连接均未关闭,服务器停止建立更多连接,如果这样做

netstat -tonpa 2>&1 | grep CLOSE

I see around 1020 sockets waiting for CLOSE. 我看到大约有1020个套接字在等待关闭。 sample out of the command, 从命令中取样

tcp 25 0 192.168.0.175:16099 192.168.0.175:41704 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 24 0 192.168.0.175:16099 192.168.0.175:41585 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 30 0 192.168.0.175:16099 192.168.0.175:41679 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 31 0 192.168.0.175:16099 192.168.0.175:41339 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)
tcp 25 0 192.168.0.175:16099 192.168.0.175:41760 CLOSE_WAIT 5250/./bl_manager off (0.00/0/0)

I am using following code to detect the client disconnection. 我正在使用以下代码来检测客户端断开连接。

for(fd = 0; fd <= fd_max; fd++) {
    if(FD_ISSET(fd, &testfds)) {
       if (fd == client_fd) {
           ioctl(fd, FIONREAD, &nread);
           if(nread == 0) {
               FD_CLR(fd, &readfds);
               close(fd);
               return 0;
           }
       }
    }
} /* for()*/

Please do let me know if am doing anything wrong. 请告诉我是否做错了什么。 Its a Python client and CPP server setup. 它是Python客户端和CPP服务器设置。

thank you 谢谢

CLOSE-WAIT means the port is waiting for the local application to close the socket, having already received a close from the peer. CLOSE-WAIT表示端口已经从对等端收到关闭消息,正在等待本地应用程序关闭套接字。 Clearly you are leaking sockets somehow, possibly in an error path. 显然,您正在以某种方式泄漏套接字,可能是在错误路径中。

Your code to 'detect client disconnection' is completely incorrect. 您用于“检测客户端断开连接”的代码完全不正确。 All you are testing is the amount of data that can be read without blocking, ie that has already arrived. 您正在测试的是可以无阻塞读取(即已经到达)的数据量。 The correct test is a return value of zero from recv() or an error other than EAGAIN/EWOULDBLOCK when reading or writing. 正确的测试是recv()的返回值为零或者在读取或写入时发生错误,而不是EAGAIN / EWOULDBLOCK。

Without knowing your platform, I can't be sure, but the fact that you're clearly using select , and you're having a problem only a few dozen away from 32768, it seems very likely that this is your problem. 在不知道您的平台的情况下,我无法确定,但是您显然正在使用select ,并且您所遇到的问题距离32768仅几十个事实,这很可能是您的问题。

An fd_set is a collection of bits, indexed by file descriptor numbers. fd_set是由文件描述符号索引的位的集合。 Every platform has a different max number. 每个平台都有不同的最大数量。 OpenBSD and recent versions of FreeBSD and OS X usually limit fd_set to an FD_SETSIZE that defaults to 1024. Different linux boxes seem to have 1024, 4096, 32768, and 65536. OpenBSD以及FreeBSD和OS X的最新版本通常将fd_set限制为默认值为1024的FD_SETSIZE 。不同的Linux盒似乎具有1024、4096、32768和65536。

So, what happens if you FD_ISSET(32800, &testfds) and FD_SETSIZE is 32768? 因此,如果FD_ISSET(32800, &testfds)FD_SETSIZE为32768,会发生什么情况? You're asking it to read a bit from arbitrary memory. 您要它从任意内存中读取一些信息。

A select or other call before this should give you an EINVAL error when you pass in 32800 for the nfds parameter… but historically, many platforms have not done so. 在为nfds传递nfds参数时,在此之前执行select或其他调用应该会给您EINVAL错误……但是历史上,许多平台都没有这样做。 Or they have returned an error, but only after filling in the first FD_SETSIZE bits properly and leaving the rest set to uninitialized memory, which means if you forget to check the error, your code seems to work until you stress it. 或他们返回了错误,但是只有在正确填充了第一个FD_SETSIZE位并将其余的设置为未初始化的内存之后,这才意味着您如果忘记检查错误,您的代码似乎可以工作直到您强调它。

This is one of the reasons using select for more than a few hundred sockets is a bad idea. 这是对数百个套接字使用select的一个坏主意的原因之一。 The other reason is that select is linear (and, worse, not linear on the number of current sockets, but linear on the highest fd, so even after most clients go away it's still slow). 另一个原因是select是线性的(更糟糕的是, 当前套接字的数量不是线性的,而是在最高fd上线性的,因此即使大多数客户端离开后,它仍然很慢)。

Most modern platforms that have select also have poll , which avoids that problem. 大多数具有select现代平台也具有poll ,这避免了该问题。

Unless you're on Windows… in which case there are completely different reasons not to use select , and different answers. 除非您使用的是Windows,否则完全有不同的原因不使用select和不同的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM