We have a queue of jobs and workers process these jobs one at a time. Each job requires us to format some data and issue an HTTP POST request, with the data as the request payload.
How can we have each worker issue these HTTP POST requests asynchronously in a single-threaded, non-blocking manner? We don't care about the response from the request -- all we want is for the request to execute as soon as possible and then for the worker to immediately move onto the next job.
We have explored using gevent
and the grequests
library (see Why does gevent.spawn not execute the parameterized function until a call to Greenlet.join? ). Our worker code looks something like this:
def execute_task(worker, job):
print "About to spawn request"
greenlet = gevent.spawn(requests.post, url, params=params)
print "Request spawned, about to call sleep"
gevent.sleep()
print "Greenlet status: ", greenlet.ready()
The first print statement executes, but the second and third print statements never get printed and the url is never hit.
How can we get these asynchronous requests to execute?
1) make a Queue.Queue object
2) make as many "worker" threads as you like which loop and read from the Queue.Queue
3) feed the jobs onto the Queue.Queue
The worker threads will read off the Queue.Queue in the order they are placed on it
example that reads lines from a file and puts them in a Queue.Queue
import sys
import urllib2
import urllib
from Queue import Queue
import threading
import re
THEEND = "TERMINATION-NOW-THE-END"
#read from file into Queue.Queue asynchronously
class QueueFile(threading.Thread):
def run(self):
if not(isinstance(self.myq, Queue)):
print "Queue not set to a Queue"
sys.exit(1)
h = open(self.f, 'r')
for l in h:
self.myq.put(l.strip()) # this will block if the queue is full
self.myq.put(THEEND)
def set_queue(self, q):
self.myq = q
def set_file(self, f):
self.f = f
An idea of what a worker thread might be like (example only)
class myWorker(threading.Thread):
def run(self):
while(running):
try:
data = self.q.get() # read from fifo
req = urllib2.Request("http://192.168.1.10/url/path")
req.add_data(urllib.urlencode(data))
h1 = urllib2.urlopen(req, timeout=10)
res = h1.read()
assert(len(res) > 80)
except urllib2.HTTPError, e:
print e
except urllib2.URLError, e:
print "done %d reqs " % n
print e
sys.exit()
To make the objects based on threading.Thread go, create the object then call "start" on the instance
You'd have to run it in different threads or use the built-in asyncore library. Most libraries will utelize threading without you even knowing, or it will rely on asyncore which is a standard part of Python.
Here's a combination of Threading and asyncore:
#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import asyncore, socket
from threading import *
from time import sleep
from os import _exit
from logger import * # <- Non-standard library containing a log function
from config import * # <- Non-standard library containing settings such as "server"
class logDispatcher(Thread, asyncore.dispatcher):
def __init__(self, config=None):
self.inbuffer = ''
self.buffer = ''
self.lockedbuffer = False
self.is_writable = False
self.is_connected = False
self.exit = False
self.initated = False
asyncore.dispatcher.__init__(self)
Thread.__init__(self)
self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
try:
self.connect((server, server_port))
except:
log('Could not connect to ' + server, 'LOG_SOCK')
return None
self.start()
def handle_connect_event(self):
self.is_connected = True
def handle_connect(self):
self.is_connected = True
log('Connected to ' + str(server), 'LOG_SOCK')
def handle_close(self):
self.is_connected = False
self.close()
def handle_read(self):
data = self.recv(8192)
while self.lockedbuffer:
sleep(0.01)
self.inbuffer += data
def handle_write(self):
while self.is_writable:
sent = self.send(self.buffer)
sleep(1)
self.buffer = self.buffer[sent:]
if len(self.buffer) <= 0:
self.is_writable = False
sleep(0.01)
def _send(self, what):
self.buffer += what + '\r\n'
self.is_writable = True
def run(self):
self._send('GET / HTTP/1.1\r\n')
while 1:
logDispatcher() # <- Initate one for each request.
asyncore.loop(0.1)
log('All threads are done, next loop in 10', 'CORE')
sleep(10)
Or you could simply do a thread that does the job and then dies.
from threading import *
class worker(Thread):
def __init__(self, host, postdata)
Thread.__init__(self)
self.host = host
self.postdata = postdata
self.start()
def run(self):
sock.send(self.postdata) #Pseudo, create the socket!
for data in postDataObjects:
worker('example.com', data)
If you need to limit the number of threads (if you're sending over 5k posts or so it might get taxing on the system) just do a while len(enumerate()) > 1000: sleep(0.1)
and let the looper object wait for a few threads to die out.
You may want to use the join
method instead of sleep
and then checking the status. If you want to execute one at a time that will solve the problem. Modifying your code slightly to test it seems to work fine.
import gevent
import requests
def execute_task(worker, job):
print "About to spawn request"
greenlet = gevent.spawn(requests.get, 'http://example.com', params={})
print "Request spawned, about to call sleep"
gevent.sleep()
print "Greenlet status: ", greenlet.ready()
print greenlet.get()
execute_task(None, None)
Gives the results:
About to spawn request
Request spawned, about to call sleep
Greenlet status: True
<Response [200]>
Is there more going on in this Python process that could be blocking Gevent from running this greenlet?
将你的url和params包装在一个列表中,然后一次一对地弹出一对任务池(这里的任务池有一个任务或者是空的),创建线程,从任务池中读取任务,当一个线程得到任务并发送请求,然后从列表中弹出另一个请求(即这实际上是一个队列列表)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.