How to properly maintain a listening port for a long time?

Question

I wrote this small server application in pure C that listens to incoming connections in a given port, very simple stuff.

It goes with the usual socket initialization procedure, create the socket() then bind() to the port, says its a listen() , and ifinitely loops through a select() waiting for incoming connections to accept() .

All goes just fine and works like a charm, except that if I leave the thing running for a couple months, the listening port closes while the application server keeps running unaware of it, since I wrote it to trust the listening socket will not close if not told to.

So the question is: Why the hell is the port being closed without my application's concern and what can I do to prevent it from happening?

Is that expected behaviour? Should I check for some kind of exceptions or make "health check" on the listening socket to reopen it if necessary?

Code: https://gist.github.com/Havenard/e930be035a3bee75c018 (yes I realize I'm using 0 as cue for errors and it's bad pratice and stuff, but it is not relevant to the question as I explained in the comments, when I set the socket file descriptor to 0 it is to stop the loop and shut down the application).

Answer 1

I would start by cleaning it up:

cut it up into smaller, readable, verifyable , testable functions
the linked lists usage look messy; it could be simplified a lot, maybe by introducing some generic functions.
replace all the silly '\\x20' character constants by the more readable ' ' equivalents
avoid manifest magic constants like here if (n_case > 0) memcpy(nick, node->nick, (n_case > 32 ? 32 : n_case)); ; sizeof is your friend.
don't use zero as a sentinel value for an unused file descriptor; use -1 instead.
use unsigned types for sizes and indexes; negative indexes will corrupt memory, fold-over unsigned types will fail fast. (failfast is your friend)

That's only a few hours of editing.

My guess is that, after cleanup/refactoring your "bug" will come to the surface magically.

Footnote: No,I won't do your work for you. Not for 100 points, not for 1000. Please clean up your own mess.

Answer 2

This answer is mostly a code review of places where you call close() .

Line 330: You close the socket but don't continue right away like in other places in your code. This could lead to weird behavior.

Line 928: In most places, you set the client or server socket to 0 after a call to close() . You don't after this call.

Line 1193: Same comment as line 928.

Line 1195: Same comment as line 928.

Line 1218: Same comment as line 928.

Line 1234: Same comment as line 928.

Line 1236: Same comment as line 928.

When I compiled the code with full warnings, I saw a number of places where the compiler noted functions declared to return a value, but no value is being returned.

x.c:582: warning: no return statement in function returning non-void
x.c:591: warning: no return statement in function returning non-void
x.c:598: warning: no return statement in function returning non-void
x.c:609: warning: no return statement in function returning non-void
x.c:620: warning: no return statement in function returning non-void
x.c:728: warning: no return statement in function returning non-void
x.c:779: warning: no return statement in function returning non-void

There are many other problems, as noted in other posts.

As far as debugging this issue, If I suspected the binding socket was being closed early, I would intercept the close() call with my own version that asserts that the descriptor being closed should not match the binding socket.

However, as wildplasser noted, select() would return an error about the invalid descriptor if it was closed.

Answer 3

The bug is that you are using 0 as invalid file descriptor. 0 is perfectly valid and is usually stdin. Then the listener is set to 0 in the signal handler. Then you use 0 as no fd, and at some point you do close(0) on some socket, there are branches that do close(fd) without checkoing it for 0 and that effectively closes the listener. The other possible option to stop the listener from working is to overflow the backlog.

And one more problem - using unsigned int for fds. system calls return -1 on error ... and that error would not be detected with if assigned to unsigned int struct identd_node -> unsigned int handle; struct thread_node -> unsigned int skt_clnt, skt_serv;

Answer 4

It looks like your code needs to have 2 consecutive errors to cause a failure.

If you get an error from select, why not print out why straight away?

On line 281, printf errno/perror to find out what the problem is?

Answer 5

Though the system should not behave as described, it sometimes does that. For server systems usually you need to perform a healthcheck call, either externally (from script), or from a special thread in your code.

So, if you detect that you can't connect to server in few consecutive attempts (few is needed due to possible overload condition), you can consider socket broken and recreate it or restart the server.

How to properly maintain a listening port for a long time?

Question

5 answers

solution1
6 2013-03-06 19:40:42

solution2
2 2013-03-12 21:25:36

solution3
1 2013-03-10 22:27:37

solution4
0 2013-03-07 12:50:11

solution5
0 2013-03-13 17:32:57

How to properly maintain a listening port for a long time?

Question

5 answers

solution1 6 2013-03-06 19:40:42

solution2 2 2013-03-12 21:25:36

solution3 1 2013-03-10 22:27:37

solution4 0 2013-03-07 12:50:11

solution5 0 2013-03-13 17:32:57

solution1
6 2013-03-06 19:40:42

solution2
2 2013-03-12 21:25:36

solution3
1 2013-03-10 22:27:37

solution4
0 2013-03-07 12:50:11

solution5
0 2013-03-13 17:32:57