简体   繁体   中英

How do I send asynchronous http requests in python one at a time?

We have a queue of jobs and workers process these jobs one at a time. Each job requires us to format some data and issue an HTTP POST request, with the data as the request payload.

How can we have each worker issue these HTTP POST requests asynchronously in a single-threaded, non-blocking manner? We don't care about the response from the request -- all we want is for the request to execute as soon as possible and then for the worker to immediately move onto the next job.

We have explored using gevent and the grequests library (see Why does gevent.spawn not execute the parameterized function until a call to Greenlet.join? ). Our worker code looks something like this:

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.post, url, params=params)

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()

The first print statement executes, but the second and third print statements never get printed and the url is never hit.

How can we get these asynchronous requests to execute?

1) make a Queue.Queue object

2) make as many "worker" threads as you like which loop and read from the Queue.Queue

3) feed the jobs onto the Queue.Queue

The worker threads will read off the Queue.Queue in the order they are placed on it

example that reads lines from a file and puts them in a Queue.Queue

import sys
import urllib2
import urllib
from Queue import Queue
import threading
import re

THEEND = "TERMINATION-NOW-THE-END"


#read from file into Queue.Queue asynchronously
class QueueFile(threading.Thread):
    def run(self):
        if not(isinstance(self.myq, Queue)):
            print "Queue not set to a Queue"
            sys.exit(1)
        h = open(self.f, 'r')
        for l in h:
            self.myq.put(l.strip())  # this will block if the queue is full
        self.myq.put(THEEND)

    def set_queue(self, q):
        self.myq = q

    def set_file(self, f):
        self.f = f

An idea of what a worker thread might be like (example only)

class myWorker(threading.Thread):
    def run(self):
        while(running):           
            try:
                data = self.q.get()  # read from fifo

                req = urllib2.Request("http://192.168.1.10/url/path")
                req.add_data(urllib.urlencode(data))
                h1 = urllib2.urlopen(req, timeout=10)
                res = h1.read()
                assert(len(res) > 80)

            except urllib2.HTTPError, e:
                print e

            except urllib2.URLError, e:
                print "done %d reqs " % n
                print e
                sys.exit()

To make the objects based on threading.Thread go, create the object then call "start" on the instance

You'd have to run it in different threads or use the built-in asyncore library. Most libraries will utelize threading without you even knowing, or it will rely on asyncore which is a standard part of Python.

Here's a combination of Threading and asyncore:

#!/usr/bin/python
# -*- coding: iso-8859-15 -*-
import asyncore, socket
from threading import *
from time import sleep
from os import _exit
from logger import *  # <- Non-standard library containing a log function
from config import *  # <- Non-standard library containing settings such as "server"

class logDispatcher(Thread, asyncore.dispatcher):
    def __init__(self, config=None):
        self.inbuffer = ''
        self.buffer = ''
        self.lockedbuffer = False
        self.is_writable = False

        self.is_connected = False

        self.exit = False
        self.initated = False

        asyncore.dispatcher.__init__(self)
        Thread.__init__(self)

        self.create_socket(socket.AF_INET, socket.SOCK_STREAM)
        try:
            self.connect((server, server_port))
        except:
            log('Could not connect to ' + server, 'LOG_SOCK')
            return None

        self.start()

    def handle_connect_event(self):
        self.is_connected = True

    def handle_connect(self):
        self.is_connected = True
        log('Connected to ' + str(server), 'LOG_SOCK')

    def handle_close(self):
        self.is_connected = False
        self.close()

    def handle_read(self):
        data = self.recv(8192)
        while self.lockedbuffer:
            sleep(0.01)

        self.inbuffer += data


    def handle_write(self):
        while self.is_writable:
            sent = self.send(self.buffer)
            sleep(1)

            self.buffer = self.buffer[sent:]
            if len(self.buffer) <= 0:
                self.is_writable = False
            sleep(0.01)

    def _send(self, what):
        self.buffer += what + '\r\n'
        self.is_writable = True

    def run(self):
        self._send('GET / HTTP/1.1\r\n')

while 1:
    logDispatcher() # <- Initate one for each request.
    asyncore.loop(0.1)
    log('All threads are done, next loop in 10', 'CORE')
    sleep(10)

Or you could simply do a thread that does the job and then dies.

from threading import *
class worker(Thread):
    def __init__(self, host, postdata)
        Thread.__init__(self)
        self.host = host
        self.postdata = postdata
        self.start()
    def run(self):
        sock.send(self.postdata) #Pseudo, create the socket!

for data in postDataObjects:
    worker('example.com', data)

If you need to limit the number of threads (if you're sending over 5k posts or so it might get taxing on the system) just do a while len(enumerate()) > 1000: sleep(0.1) and let the looper object wait for a few threads to die out.

You may want to use the join method instead of sleep and then checking the status. If you want to execute one at a time that will solve the problem. Modifying your code slightly to test it seems to work fine.

import gevent
import requests

def execute_task(worker, job):

    print "About to spawn request"
    greenlet = gevent.spawn(requests.get, 'http://example.com', params={})

    print "Request spawned, about to call sleep"
    gevent.sleep()

    print "Greenlet status: ", greenlet.ready()
    print greenlet.get()

execute_task(None, None)

Gives the results:

About to spawn request
Request spawned, about to call sleep
Greenlet status:  True
<Response [200]>

Is there more going on in this Python process that could be blocking Gevent from running this greenlet?

将你的url和params包装在一个列表中,然后一次一对地弹出一对任务池(这里的任务池有一个任务或者是空的),创建线程,从任务池中读取任务,当一个线程得到任务并发送请求,然后从列表中弹出另一个请求(即这实际上是一个队列列表)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM