如何长时间正确维护监听端口？

Question

I wrote this small server application in pure C that listens to incoming connections in a given port, very simple stuff. 我用纯C编写了这个小型服务器应用程序，它监听给定端口中的传入连接，非常简单。

It goes with the usual socket initialization procedure, create the socket() then bind() to the port, says its a listen() , and ifinitely loops through a select() waiting for incoming connections to accept() . 它与通常的套接字初始化过程一致，创建socket()然后bind()到端口，说它是一个listen() ，并且ifitely循环通过select()等待传入连接accept() 。

All goes just fine and works like a charm, except that if I leave the thing running for a couple months, the listening port closes while the application server keeps running unaware of it, since I wrote it to trust the listening socket will not close if not told to. 一切都很好，就像一个魅力，除了如果我让事情运行几个月，监听端口关闭，而应用程序服务器不知不觉地运行，因为我写它信任监听套接字不会关闭，如果没告诉。

So the question is: Why the hell is the port being closed without my application's concern and what can I do to prevent it from happening? 所以问题是：为什么在没有我的应用程序关注的情况下关闭端口，我该怎么做以防止它发生？

Is that expected behaviour? 这是预期的行为吗？ Should I check for some kind of exceptions or make "health check" on the listening socket to reopen it if necessary? 我应该检查某种异常还是在监听套接字上进行“健康检查”，以便在必要时重新打开它？

Code: https://gist.github.com/Havenard/e930be035a3bee75c018 (yes I realize I'm using 0 as cue for errors and it's bad pratice and stuff, but it is not relevant to the question as I explained in the comments, when I set the socket file descriptor to 0 it is to stop the loop and shut down the application). 代码： https ： //gist.github.com/Havenard/e930be035a3bee75c018 （是的，我知道我使用0作为错误的提示，这是不好的实践和东西，但它与我在评论中解释的问题无关，当我将套接字文件描述符设置为0它将停止循环并关闭应用程序）。

Answer 1

I would start by cleaning it up: 我会从清理它开始：

cut it up into smaller, readable, verifyable , testable functions 将其切割成更小，可读，可验证，可测试的功能
the linked lists usage look messy; 链表用法看起来很乱; it could be simplified a lot, maybe by introducing some generic functions. 它可以简化很多，可能通过引入一些通用函数。
replace all the silly '\\x20' character constants by the more readable ' ' equivalents 用更可读的“等价物”替换所有愚蠢的'\\ x20'字符常量
avoid manifest magic constants like here if (n_case > 0) memcpy(nick, node->nick, (n_case > 32 ? 32 : n_case)); if (n_case > 0) memcpy(nick, node->nick, (n_case > 32 ? 32 : n_case));避免显示类似于此处的明显魔术常量if (n_case > 0) memcpy(nick, node->nick, (n_case > 32 ? 32 : n_case)); ; ; sizeof is your friend. sizeof是你的朋友。
don't use zero as a sentinel value for an unused file descriptor; 不要将零用作未使用文件描述符的标记值; use -1 instead. 用-1代替。
use unsigned types for sizes and indexes; 使用无符号类型的大小和索引; negative indexes will corrupt memory, fold-over unsigned types will fail fast. 负索引会破坏内存，折叠无符号类型会快速失败。 (failfast is your friend) （failfast是你的朋友）

That's only a few hours of editing. 这只是几个小时的编辑。

My guess is that, after cleanup/refactoring your "bug" will come to the surface magically. 我的猜测是，在清理/重构后，你的“虫子”会神奇地浮出水面。

Footnote: No,I won't do your work for you. 脚注：不，我不会为你做你的工作。 Not for 100 points, not for 1000. Please clean up your own mess. 不是100分，不是1000分。请清理你自己的烂摊子。

Answer 2

This answer is mostly a code review of places where you call close() . 这个答案主要是对你调用close()的地方的代码审查。

Line 330: You close the socket but don't continue right away like in other places in your code. 第330行：关闭套接字但不要像代码中的其他位置那样立即继续。 This could lead to weird behavior. 这可能会导致奇怪的行为。

Line 928: In most places, you set the client or server socket to 0 after a call to close() . 第928行：在大多数地方，在调用close()后将客户端或服务器套接字设置为0 。 You don't after this call. 这次通话后你没有。

Line 1193: Same comment as line 928. 第1193行：与第928行相同的注释。

Line 1195: Same comment as line 928. 第1195行：与第928行相同的注释。

Line 1218: Same comment as line 928. 第1218行：与第928行相同的注释。

Line 1234: Same comment as line 928. 第1234行：与第928行相同的注释。

Line 1236: Same comment as line 928. 第1236行：与第928行相同的注释。

When I compiled the code with full warnings, I saw a number of places where the compiler noted functions declared to return a value, but no value is being returned. 当我用完全警告编译代码时，我看到了许多地方编译器声明函数声明返回一个值，但没有返回任何值。

x.c:582: warning: no return statement in function returning non-void
x.c:591: warning: no return statement in function returning non-void
x.c:598: warning: no return statement in function returning non-void
x.c:609: warning: no return statement in function returning non-void
x.c:620: warning: no return statement in function returning non-void
x.c:728: warning: no return statement in function returning non-void
x.c:779: warning: no return statement in function returning non-void

There are many other problems, as noted in other posts. 如其他帖子所述，还有许多其他问题。

As far as debugging this issue, If I suspected the binding socket was being closed early, I would intercept the close() call with my own version that asserts that the descriptor being closed should not match the binding socket. 至于调试这个问题，如果我怀疑绑定套接字是提前关闭的，我会用我自己的版本拦截close()调用，该版本声明被关闭的描述符不应该与绑定套接字匹配。

However, as wildplasser noted, select() would return an error about the invalid descriptor if it was closed. 但是，正如wildplasser所指出的那样， select()会在无效描述符关闭时返回错误。

Answer 3

The bug is that you are using 0 as invalid file descriptor. 错误是您使用0作为无效文件描述符。 0 is perfectly valid and is usually stdin. 0完全有效，通常是标准输入。 Then the listener is set to 0 in the signal handler. 然后在信号处理程序中将侦听器设置为0。 Then you use 0 as no fd, and at some point you do close(0) on some socket, there are branches that do close(fd) without checkoing it for 0 and that effectively closes the listener. 然后你使用0作为没有fd，并且在某些时候你在某个套接字上关闭（0），有些分支关闭（fd）而不检查它为0并且有效地关闭了监听器。 The other possible option to stop the listener from working is to overflow the backlog. 阻止侦听器工作的另一个可能选项是溢出积压。

And one more problem - using unsigned int for fds. 还有一个问题 - 对fds使用unsigned int。 system calls return -1 on error ... and that error would not be detected with if assigned to unsigned int struct identd_node -> unsigned int handle; 系统调用在错误时返回-1 ...如果分配给unsigned int struct identd_node - > unsigned int handle，则不会检测到该错误; struct thread_node -> unsigned int skt_clnt, skt_serv; struct thread_node - > unsigned int skt_clnt，skt_serv;

Answer 4

It looks like your code needs to have 2 consecutive errors to cause a failure. 看起来您的代码需要连续2次错误才能导致失败。

If you get an error from select, why not print out why straight away? 如果您从选择中收到错误，为什么不立即打印出原因？

On line 281, printf errno/perror to find out what the problem is? 在第281行，printf errno / perror找出问题所在？

Answer 5

Though the system should not behave as described, it sometimes does that. 虽然系统不应该如上所述，但它有时会这样做。 For server systems usually you need to perform a healthcheck call, either externally (from script), or from a special thread in your code. 对于服务器系统，通常需要在外部（从脚本）或从代码中的特殊线程执行运行状况检查。

So, if you detect that you can't connect to server in few consecutive attempts (few is needed due to possible overload condition), you can consider socket broken and recreate it or restart the server. 因此，如果您检测到连续几次尝试无法连接到服务器（由于可能的过载情况而需要很少），您可以考虑断开套接字并重新创建它或重新启动服务器。

如何长时间正确维护监听端口？

问题描述

5 个解决方案

解决方案1
6 2013-03-06 19:40:42

解决方案2
2 2013-03-12 21:25:36

解决方案3
1 2013-03-10 22:27:37

解决方案4
0 2013-03-07 12:50:11

解决方案5
0 2013-03-13 17:32:57

如何长时间正确维护监听端口？

问题描述

5 个解决方案

解决方案1 6 2013-03-06 19:40:42

解决方案2 2 2013-03-12 21:25:36

解决方案3 1 2013-03-10 22:27:37

解决方案4 0 2013-03-07 12:50:11

解决方案5 0 2013-03-13 17:32:57

解决方案1
6 2013-03-06 19:40:42

解决方案2
2 2013-03-12 21:25:36

解决方案3
1 2013-03-10 22:27:37

解决方案4
0 2013-03-07 12:50:11

解决方案5
0 2013-03-13 17:32:57