简体   繁体   中英

Multiple simultaneous HTTP requests

I'm trying to take a list of items and check for their status change based on certain processing by the API. The list will be manually populated and can vary in number to several thousand.

I'm trying to write a script that makes multiple simultaneous connections to the API to keep checking for the status change. For each item, once the status changes, the attempts to check must stop. Based on reading other posts on Stackoverflow (Specifically, What is the fastest way to send 100,000 HTTP requests in Python? ), I've come up with the following code. But the script always stops after processing the list once. What am I doing wrong?

One additional issue that I'm facing is that the keyboard interrup method never fires (I'm trying with Ctrl+C but it does not kill the script.

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

requestURLBase = "https://example.com/api"
apiKey = "123456"

concurrent = 200

keepTrying = 1

def doWork():
    while keepTrying == 1:
        url = q.get()
        status, body, url = checkStatus(url)
        checkResult(status, body, url)
        q.task_done()

def checkStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(requestURLBase)
        conn.request("GET", url.path)
        res = conn.getresponse()
        respBody = res.read()
        conn.close()
        return res.status, respBody, ourl #Status can be 210 for error or 300 for successful API response
    except:
        print "ErrorBlock"
        print res.read()
        conn.close()
        return "error", "error", ourl

def checkResult(status, body, url):
    if "unavailable" not in body:
        print status, body, url
        keepTrying = 1
    else:
        keepTrying = 0

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for value in open('valuelist.txt'):
        fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
        print fullUrl
        q.put(fullUrl)
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

I'm new to Python so there could be syntax errors as well... I'm definitely not familiar with multi-threading so perhaps I'm doing something else wrong as well.

In the code, the list is only read once. Should be something like

try:
    while True:
        for value in open('valuelist.txt'):
            fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
            print fullUrl
            q.put(fullUrl)
        q.join()

For the interrupt thing, remove the bare except line in checkStatus or make it except Exception . Bare excepts will catch all exceptions, including SystemExit which is what sys.exit raises and stop the python process from terminating.

If I may make a couple comments in general though.

  • Threading is not a good implementation for such large concurrencies
  • Creating a new connection every time is not efficient

What I would suggest is

  1. Use gevent for asynchronous network I/O
  2. Pre-allocate a queue of connections same size as concurrency number and have checkStatus grab a connection object when it needs to make a call. That way the connections stay alive, get reused and there is no overhead in creating and destroying them and the increased memory use that goes with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM