简体   繁体   中英

Should a TCP client be able to pause the server, when the TCP server reads a non-blocking socket

Overview

I have a simple question with code below. Hopefully I didn't make a mistake in the code.

I'm a network engineer, and I need to test certain linux behavior of our business application keepalives during network outages (I'm going to insert some iptables stuff later to jack with the connection - first I want to make sure I got the client & server right).

As part of a network failure test I'm conducting, I wrote a non-blocking Python TCP client and server that are supposed to blindly send messages to each other in a loop. To understand what's happening I am using loop counters.

The server's loop should be relatively straightforward. I loop through every fd that select says is ready. I never even import sleep anywhere in my server's code. From this perspective, I don't expect the server's code to pause while it loops over the client's socket , but for some reason the server code pauses intermittently (more detail, below).

I initially didn't put a sleep in the client's loop. Without a sleep on the client side, the server and client seem to be as efficient as I want. However, when I put a time.sleep(1) statement after the client does an fd.send() to the server , the TCP server code intermittently pauses while the client is sleeping.

My questions:

  • Should I be able to write a single-threaded Python TCP server that doesn't pause when the client hits time.sleep() in the client's fd.send() loop? If so, what am I doing wrong? <- ANSWERED
  • If I wrote this test code correctly and the server shouldn't pause , why is the TCP server intermittently pausing while it polls the client's connection for data?

Reproducing the scenario

I'm running this on two RHEL6 linux machines. To reproduce the issue...

  • Open two different terminals.
  • Save the client and server scripts in different files
  • Change the shebang path to your local python (I'm using Python 2.7.15)
  • Change the SERVER_HOSTNAME and SERVER_DOMAIN in the client's code to be the hostname and domain of the server you're running this on
  • Start the server first, then start the client.

After the client connects, you'll see messages as shown in EXHIBIT 1 scrolling quickly in the server's terminal. After a few seconds The scrolling pauses intermittently when the client hits time.sleep() . I don't expect to see those pauses, but maybe I've misunderstood something.

EXHIBIT 1

---
LOOP_COUNT 0
---
LOOP_COUNT 1
---
LOOP_COUNT 2
---
LOOP_COUNT 3
CLIENTMSG: 'client->server 0'
---
LOOP_COUNT 4
---
LOOP_COUNT 5
---
LOOP_COUNT 6
---
LOOP_COUNT 7
---
LOOP_COUNT 8
---
LOOP_COUNT 9
---
LOOP_COUNT 10
---
LOOP_COUNT 11
---

Summary resolution

If I wrote this test code correctly and the server shouldn't pause, why is the TCP server intermittently pausing while it polls the client's connection for data?

Answering my own question. My blocking problem was caused by calling select() with a non-zero timeout.

When I changed select() to use a zero-second timeout, I got expected results.

Final non-blocking code (incorporating suggestions in answers):

tcp_server.py

#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
from socket import MSG_DONTWAIT
#from socket import MSG_OOB  <--- for send()
from socket import socket
import socket as socket_module
import select
import errno
import fcntl
import time
import sys
import os

def get_errno_info(e, op='', debugmsg=False):
    """Return verbose information from errno errors, such as errors returned by python socket()"""
    VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
    assert op.lower() in VALID_OP, "op must be: {0}".format(
        ','.join(sorted(VALID_OP)))

    ## ref: man 3 errno (in linux)... other systems may be man 2 intro
    ##   also see https://docs.python.org/2/library/errno.html
    try:
        retval_int = int(e.args[0])         # Example: 32
        retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
        retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
    except:
        ## I don't expect to get here unless something broke in python errno...
        retval_int  = -1
        retval_str  = '__somethingswrong__'
        retval_code = 'BADFAIL'

    if debugmsg:
        print "DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
            op, retval_int, retval_code, retval_str)
    return retval_int, retval_str, retval_code


host = ''
port = 6667     # IRC service
DEBUG = True

serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)

#fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK)  # Make the socket non-blocking
serv_sock.setblocking(False)

sock_list = [serv_sock]

from_client_str = '__DEFAULT__'

to_client_idx = 0
loop_count = 0
need_send_select = False
while True:
    if need_send_select:
        # Only do this after send() EAGAIN or EWOULDBLOCK...
        send_sock_list = sock_list
    else:
        send_sock_list = []

    #print "---"
    #print "LOOP_COUNT",  loop_count

    recv_ready_list, send_ready_list, exception_ready = select.select(
        sock_list, send_sock_list, [], 0.0)  # Last float is the select() timeout...


    ## Read all sockets which are output-ready... might be client or server...
    for sock_fd in recv_ready_list:

        # accept() if we're reading on the server socket...
        if sock_fd is serv_sock:
            try:
                clientsock, clientaddr = sock_fd.accept()
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='accept',
                    debugmsg=DEBUG)

            assert sock_fd.gettimeout()==0.0, "client socket should be in non-blocking mode"
            sock_list.append(clientsock)

        # read input from the client socket...
        else:
            try:
                from_client_str = sock_fd.recv(1024, MSG_DONTWAIT)
                if from_client_str=='':
                    # Client closed the socket...
                    print "CLIENT CLOSED SOCKET"
                    sock_list.remove(sock_fd)
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='recv',
                    debugmsg=DEBUG)
                if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                    # socket unavailable to read()
                    continue
                elif errcode=='ECONNRESET' or errcode=='EPIPE':
                    # Client closed the socket...
                    sock_list.remove(sock_fd)
                else:
                    print "UNHANDLED SOCKET ERROR", errcode, errint, errstr
                    sys.exit(1)


            print "from_client_str: '{0}'".format(from_client_str)

    ## Adding dynamic_list, per input from EJP, below...
    if need_send_select is False:
        dynamic_list = sock_list
    else:
        dynamic_list = send_ready_list
    ## NOTE:  socket code shouldn't walk this list unless a write is pending...
    ##      broadast the same message to all clients...
    for sock_fd in dynamic_list:

        ## Ignore server's listening socket...
        if sock_fd is serv_sock:
            ## Only send() to accept()ed sockets...
            continue

        try:

            to_client_str = "server->client: {0}\n".format(to_client_idx)
            send_retval = sock_fd.send(to_client_str, MSG_DONTWAIT)
            ## send() returns the number of bytes written, on success
            ##     disabling assert check on sent bytes while using MSG_DONTWAIT
            #assert send_retval==len(to_client_str)

            to_client_idx += 1
            need_send_select = False
        except socket_module.error, e:
            errstr, errint, errcode = get_errno_info(e, op='send',
                debugmsg=DEBUG)
            if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                need_send_select = True
                continue
            elif errcode=='ECONNRESET' or errcode=='EPIPE':
                # Client closed the socket...
                sock_list.remove(sock_fd)
            else:
                print "FATAL UNHANDLED SOCKET ERROR", errcode, errint, errstr
                sys.exit(1)

    loop_count += 1

tcp_client.py

#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM
from socket import MSG_DONTWAIT    # non-blocking send/recv; see man 2 recv
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os

## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myServerHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 6667
DEBUG = True

def get_errno_info(e, op='', debugmsg=False):
    """Return verbose information from errno errors, such as errors returned by python socket()"""
    VALID_OP = set(['accept', 'connect', 'send', 'recv', 'read', 'write'])
    assert op.lower() in VALID_OP, "op must be: {0}".format(
        ','.join(sorted(VALID_OP)))

    ## ref: man 3 errno (in linux)... other systems may be man 2 intro
    ##   also see https://docs.python.org/2/library/errno.html
    try:
        retval_int = int(e.args[0])         # Example: 32
        retval_str = os.strerror(e.args[0]) # Example: 'Broken pipe'
        retval_code = errno.errorcode.get(retval_int, 'MODULEFAIL') # Ex: EPIPE
    except:
        ## I don't expect to get here unless something broke in python errno...
        retval_int  = -1
        retval_str  = '__somethingswrong__'
        retval_code = 'BADFAIL'

    if debugmsg:
        print "DEBUG: Can't {0}() on socket (errno:{1}, code:{2} / {3})".format(
            op, retval_int, retval_code, retval_str)
    return retval_int, retval_str, retval_code


connect_finished = False
while not connect_finished:
    try:
        c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
        # Set socket non-blocking
        #fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)
        c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
        c2s.setblocking(False)
        assert c2s.gettimeout()==0.0, "c2s socket should be in non-blocking mode"
        connect_finished = True
    except socket_module.error, e:
        errstr, errint, errcode = get_errno_info(e, op='connect',
            debugmsg=DEBUG)
        if errcode=='EINPROGRESS':
            pass

to_srv_idx = 0
need_send_select = False
while True:
    socket_list = [c2s]

    # Get the list sockets which can: take input, output, etc...
    if need_send_select:
        # Only do this after send() EAGAIN or EWOULDBLOCK...
        send_sock_list = socket_list
    else:
        send_sock_list = []
    recv_ready_list, send_ready_list, exception_ready = select.select(
        socket_list, send_sock_list, [])

    for sock_fd in recv_ready_list:
        assert sock_fd is c2s, "Strange socket failure here"

        #incoming message from remote server
        try:
            from_srv_str = sock_fd.recv(1024, MSG_DONTWAIT)
        except socket_module.error, e:
            ## https://stackoverflow.com/a/16745561/667301
            errstr, errint, errcode = get_errno_info(e, op='recv',
                debugmsg=DEBUG)
            if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                # Busy, try again later...
                print "recv() BLOCKED"
                continue
            elif errcode=='ECONNRESET' or errcode=='EPIPE':
                # Server ended normally...
                sys.exit(0)

        ## NOTE: if we get this far, we successfully received from_srv_str.
        ##    Anything caught above, is some kind of fail...
        print "from_srv_str: {0}".format(from_srv_str)

    ## Adding dynamic_list, per input from EJP, below...
    if need_send_select is False:
        dynamic_list = socket_list
    else:
        dynamic_list = send_ready_list
    for sock_fd in dynamic_list:
        # outgoing message to remote server
        if sock_fd is c2s:
            try:
                to_srv_str = 'client->server {0}'.format(to_srv_idx)
                sock_fd.send(to_srv_str, MSG_DONTWAIT)

                               ##
                time.sleep(1)  ## Client blocks the server here... Why????
                               ##

                to_srv_idx += 1
                need_send_select = False
            except socket_module.error, e:
                errstr, errint, errcode = get_errno_info(e, op='send',
                    debugmsg=DEBUG)
                if errcode=='EAGAIN' or errcode=='EWOULDBLOCK':
                    ## Try to send() later...
                    print "send() BLOCKED"
                    need_send_select = True
                    continue
                elif errcode=='ECONNRESET' or errcode=='EPIPE':
                    # Server ended normally...
                    sys.exit(0)

Original Question Code:

tcp_server.py

#!/usr/bin/python -u
from socket import AF_INET, SOCK_STREAM, SO_REUSEADDR, SOL_SOCKET
#from socket import MSG_OOB  <--- for send()
from socket import socket
import socket as socket_module
import select
import fcntl
import os

host = ''
port = 9997

serv_sock = socket(AF_INET, SOCK_STREAM)
serv_sock.setsockopt(SOL_SOCKET, SOCK_STREAM, 1)
serv_sock.bind((host, port))
serv_sock.listen(5)

fcntl.fcntl(serv_sock, fcntl.F_SETFL, os.O_NONBLOCK)  # Make the socket non-blocking

sock_list = [serv_sock]

from_client_str = '__DEFAULT__'

to_client_idx = 0
loop_count = 0
while True:
    recv_ready_list, send_ready_list, exception_ready = select.select(sock_list, sock_list,
        [], 5)

    print "---"
    print "LOOP_COUNT",  loop_count

    ## Read all sockets which are input-ready... might be client or server...
    for sock_fd in recv_ready_list:

        # accept() if we're reading on the server socket...
        if sock_fd is serv_sock:
            clientsock, clientaddr = sock_fd.accept()
            sock_list.append(clientsock)

        # read input from the client socket...
        else:
            try:
                from_client_str = sock_fd.recv(4096)
                if from_client_str=='':
                    # Client closed the socket...
                    print "CLIENT CLOSED SOCKET"
                    sock_list.remove(sock_fd)
            except socket_module.error, e:
                print "WARNING RECV FAIL"


            print "from_client_str: '{0}'".format(from_client_str)

    for sock_fd in send_ready_list:
        if sock_fd is not serv_sock:
            try:
                to_client_str = "server->client: {0}\n".format(to_client_idx)
                sock_fd.send(to_client_str)
                to_client_idx += 1
            except socket_module.error, e:
                print "TO CLIENT SEND ERROR", e

    loop_count += 1

tcp_client.py

#!/usr/bin/python -u
    
from socket import AF_INET, SOCK_STREAM
from socket import gethostname, socket
import socket as socket_module
import select
import fcntl
import errno
import time
import sys
import os

## NOTE: Using this script to simulate a scheduler
SERVER_HOSTNAME = 'myHostname'
SERVER_DOMAIN = 'mydomain.local'
PORT = 9997

def handle_socket_error_continue(e):
    ## non-blocking socket info from:
    ## https://stackoverflow.com/a/16745561/667301
    print "HANDLE_SOCKET_ERROR_CONTINUE"
    err = e.args[0]
    if (err==errno.EAGAIN) or (err==errno.EWOULDBLOCK):
        print 'CLIENT DEBUG: No data input from server'
        return True
    else:
        print 'FROM SERVER RECV ERROR: {0}'.format(e)
        sys.exit(1)

c2s = socket(AF_INET, SOCK_STREAM) # Client to server socket...
c2s.connect(('.'.join((SERVER_HOSTNAME, SERVER_DOMAIN,)), PORT))
# Set socket non-blocking...
fcntl.fcntl(c2s, fcntl.F_SETFL, os.O_NONBLOCK)

to_srv_idx = 0
while True:
    socket_list = [c2s]

    # Get the list sockets which can: take input, output, etc...
    recv_ready_list, send_ready_list, exception_ready = select.select(
        socket_list, socket_list, [])

    for sock_fd in recv_ready_list:
        assert sock_fd is c2s, "Strange socket failure here"

        #incoming message from remote server
        try:
            from_srv_str = sock_fd.recv(4096)
        except socket_module.error, e:
            ## https://stackoverflow.com/a/16745561/667301
            err_continue = handle_socket_error_continue(e)
            if err_continue is True:
                continue
        else:
            if len(from_srv_str)==0:
                print "SERVER CLOSED NORMALLY"
                sys.exit(0)

        ## NOTE: if we get this far, we successfully received from_srv_str.
        ##    Anything caught above, is some kind of fail...
        print "from_srv_str: {0}".format(from_srv_str)

    for sock_fd in send_ready_list:
        #incoming message from remote server
        if sock_fd is c2s:
            #to_srv_str = raw_input('Send to server: ')
            try:
                to_srv_str = 'client->server {0}'.format(to_srv_idx)
                sock_fd.send(to_srv_str)

                               ##
                time.sleep(1)  ## Client blocks the server here... Why????
                               ##

                to_srv_idx += 1
            except socket_module.error, e:
                print "TO SERVER SEND ERROR", e

TCP sockets are almost always ready for writing, unless their socket send buffer is full.

It is therefore incorrect to always select on writability for a socket. You should only do so after you've encountered a send failure due to EAGAIN/EWOULDBLOCK. Otherwise your server will spin mindlessly processing writeable sockets, which will usually be all of them.

However, when I put a time.sleep(1) statement after the client does an fd.send() to the server, the TCP server code intermittently pauses while the client is sleeping.

AFAICT from running the provided code (nice self-contained example, btw), the server is behaving as intended.

In particular, the semantics of the select() call are that select() shouldn't return until there is something for the thread to do. Having the thread block inside select() is a good thing when there is nothing that the thread can do right now anyway, since it prevents the thread from spinning the CPU for no reason.

So in this case, your server program has told select() that it wants select() to return only when at least one of the following conditions is true:

  1. serv_sock is ready-for-read (which is to say, a new client wants to connect to the server now)
  2. serv_sock is ready-for-write (I don't believe this ever actually happens on a listening-socket, so this criterion can probably be ignored)
  3. clientsock is ready-for-read (that is, the client has sent some bytes to the server and they are waiting in clientsock 's buffer for the server thread to recv() them)
  4. clientsock is ready-for-write (that is, clientsock has some room in its outgoing-data-buffer that the server could send() data into if it wants to send some data back to the client)
  5. Five seconds have passed since the call to select() started blocking.

I see (via print-debugging) that when your server program blocks, it is blocking inside select() , which indicates that none of the 5 conditions above are being met during the blocking-period.

Why is that? Well, let's go down the list.

  1. Not met because no other clients are trying to connect
  2. Not met because this never happens
  3. Not met because the server has read all of the data that the connected client has sent (and since the connected client is itself sleeping, it's not sending any more data)
  4. Not met because the server has filled up the outgoing-data buffer of its clientsock (because the client program is sleeping, it's only reading the data coming from the server intermittently, and the TCP layer guarantees lossless/in-order transmission, so once clientsock 's outgoing-data-buffer is full, clientsock won't select-as-ready-for-write unless/until the client reads at least some data from its end of the conenction)
  5. Not met because 5 seconds haven't elapsed yet since select() started blocking.

So is this behavior actually a problem for the server? In fact it is not, because the server will still be responsive to any other clients that connect to the server. In particular, select() will still return right away whenever serv_sock or any other client's socket select() s as ready-for-read (or ready-for-write) and so the server can handle the other clients just fine while waiting for your hacked/slow client to wake up.

The hacked/slow client might be a problem for the user , but there's nothing the server can really do about that (short of forcibly disconnecting the client's TCP connection, or maybe printing out a log message requesting that someone debug the connected client program, I suppose :)).

I agree with EJP, btw -- selecting on ready-for-write should only be done on sockets that you actually want to write some data to. If you don't actually have any desire to write to the socket ASAP, then it's pointless and counterproductive to instruct select() to return as soon as that socket is ready-for-write: the problem with doing so is that you're likely to spin the CPU a lot whenever any socket's outgoing-data-buffer is less-than-full (which in most applications, is most of the time!). The user-visible symptom of the problem would be that your server program is using up 100% of a CPU core even when it ought to be idle or mostly-idle.

If I wrote this test code correctly and the server shouldn't pause, why is the TCP server intermittently pausing while it polls the client's connection for data?

Answering my own question. My blocking problem was caused by calling select() with a non-zero timeout .

When I changed select() to use a zero-second timeout, I got expected results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM