From asyncore's documentation: https://docs.python.org/2/library/asyncore.html
import asyncore, socket
class HTTPClient(asyncore.dispatcher):
def __init__(self, host, path):
asyncore.dispatcher.__init__(self)
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.connect( (host, 80) )
self.buffer = 'GET %s HTTP/1.0\r\n\r\n' % path
def handle_connect(self):
pass
def handle_close(self):
self.close()
def handle_read(self):
print self.recv(8192)
def writable(self):
return (len(self.buffer) > 0)
def handle_write(self):
sent = self.send(self.buffer)
self.buffer = self.buffer[sent:]
client = HTTPClient('www.python.org', '/')
asyncore.loop()
Now suppose instead we have:
def handle_read(self):
data = self.recv(8192)
//SOME REALLY LONG AND COMPLICATED THING
Is this handled in Asyncore itself due to asyncore's polling/select methodlogy, or do I need to do:
def handle_read(self):
data = self.recv(8192)
h = Handler(data)
h.start()
class Handler(threading.Thread):
def __init__(self, data):
threading.Thread.__init__(self)
self.data = data
def run():
//LONG AND COMPLICATED THING WITH DATA
If I do need a thread, do I want h.join()
after start
? It seems to work, but since join blocks, I'm not exactly sure why.
Is this handled in Asyncore itself due to asyncore's polling/select methodlogy?
No, asyncore cannot handle by itself a long blocking task in handle_read()
, since there is only one thread. The thread is doing some long job and it cannot be interrupted by the same thread.
However, such blocking implementation makes sense. The only issue is that the network transfer is slower. For example if the long task takes 1 second, then the maximum data transfer rate is 8192 bytes per second. Although the data rate is slower the network connection is stable and it works as expected. That is handled by TCP protocol implementation in the operating system kernel.
...or do I need to do...? If I do need a thread, do I want
h.join()
after start?
None of the above thread usages make sense. However, it is still possible to use helper thread to download data at maximum rate and to process that data in parallel, see below for explanations.
TCP provides reliable, ordered, and error-checked delivery of a stream.
Flow control — limits the rate a sender transfers data to guarantee reliable delivery. The receiver continually hints the sender on how much data can be received (controlled by the sliding window). When the receiving host's buffer fills, the next acknowledgment contains a 0 in the window size, to stop transfer and allow the data in the buffer to be processed.
...
When a receiver advertises a window size of 0, the sender stops sending data and starts the persist timer. The persist timer is used to protect TCP from a deadlock situation that could arise if a subsequent window size update from the receiver is lost, and the sender cannot send more data until receiving a new window size update from the receiver. When the persist timer expires, the TCP sender attempts recovery by sending a small packet so that the receiver responds by sending another acknowledgement containing the new window size.
So, when the data is not read from the socket due to long task in handle_read()
the socket buffer becomes full. The TCP connection suspends and does not receive any new data packets. After recv()
new data can be received, so the TCP ACK
packet is sent to the sender to update TCP window size.
The similar behavior can be observed with file downloader applications when data transfer rate is limited by the settings. For example, if the limit is set to 1Kb/s the downloader may call recv(1000)
once per second. Even if the physical network connection is able to send 1Mb/s, only 1Kb/s will be received. In that case it is possible to see by tcpdump
or by Wireshark
TCP Zero Window packets and TCP Window Update packets.
Although the application will work with long blocking task, the network connection is usually considered as bottleneck . So, it may be better to release network as soon as possible.
If the long task takes much longer then data download the simplest solution is to download everything and only then process downloaded data. However it may be not acceptable if time for data download is commensurate with time for data processing task. For example 1 hour for download + 2 hours for processing can be done in 2 hours if processing is performed in parallel to download.
If a new thread is created in handle_read()
and the main thread does not wait for the helper thread to finish (without join()
) the application may create huge number threads. Note that handle_read()
may be called thousands times per second and if each long task takes more then second the application may create hundreds of threads and finally it may be killed by an exception. Such solution does not make sense since there is no control over number of threads and also data blocks handled by those threads are also random. The function recv(8192)
receives at most 8192
bytes, but it also may receive smaller data block.
It does not make any sense to create a thread and immediately block execution of the main thread by join()
, since such solution is not better than just initial solution without any thread.
Some helper thread and later join()
may be used to do something in parallel. For example:
# Start detached thread
h.start()
# Do something in parallel to that thread
# ...
# Wait the thread to finish
h.join()
However, here it is not such case.
It is possible to create one persistent worker thread (or several to use all CPU cores) that will be responsible for data processing. It should be started before asyncore.loop()
, for example:
handler = Handler()
asyncore.loop()
Now once the handler thread is ready it can take all downloaded data for processing and at the same time the main thread may continue with data download. While the handler thread is busy the downloader appends data to its data buffer. It is needed to take care about proper synchronization between threads:
buffer
, the handler thread should wait before it will have access to that buffer
; buffer
the downloader should wait before it will be able to append to buffer
; buffer
is empty it should freeze and wait for new downloaded data. That can be achieved using threading condition object and the producer-consumer example:
# create a new condition variable on __init__
cv = threading.Condition()
# Consume one item by Handler
cv.acquire()
while not an_item_is_available():
cv.wait()
get_an_available_item()
cv.release()
# DO SOME REALLY LONG AND COMPLICATED THING
# Produce one item by Downloader
cv.acquire()
make_an_item_available()
cv.notify()
cv.release()
Here make_an_item_available()
may be related to appending downloaded data to buffer
and or setting some other shared state variables (for example in handle_close()
). The handler thread should do its long task after cv.release()
, so during that long task the downloader is able to acquire the lock and append new data to the buffer
.
This is along the same lines as a question I had previously asked here .
If you have a LONG AND COMPLICATED THING WITH DATA
that you need to achieve, executing it within the event loop will block the event loop from doing anything else until your task has completed.
The same is true if you spawn a thread and then join()
it ( join
simply blocks execution until the joined thread is finished); however, if you spawn a worker thread and let it run to completion on its own, then the event loop is free to continue processing while your long task completes in parallel.
I am posting my own answer because it was inspired by Orest Hera's answer, but because I have knowledge of my workload, it is a slight variant.
My workload is such that requests can arrive in bursts, but these burts are sporadic (non-stationary). Moreover, they need to be processed in order they are received. So, here is what I did:
#! /usr/bin/env python3
import asyncore #https://docs.python.org/2/library/asyncore.html
import socket
import threading
import queue
import time
fqueue = queue.Queue()
class Handler(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
self.keep_reading = True
def run(self):
while self.keep_reading:
if fqueue.empty():
time.sleep(1)
else:
#PROCESS
def stop(self):
self.keep_reading = False
class Listener(asyncore.dispatcher): #http://effbot.org/librarybook/asyncore.htm
def __init__(self, host, port):
asyncore.dispatcher.__init__(self)
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
self.connect((host, port))
def handle_read(self):
data = self.recv(40) #pretend it always waits for 40 bytes
fqueue.put(data)
def start(self):
try:
h = Handler()
h.start()
asyncore.loop()
except KeyboardInterrupt:
pass
finally:
h.stop()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.