多个同时的HTTP请求

Question

I'm trying to take a list of items and check for their status change based on certain processing by the API. 我正在尝试获取项目列表，并根据API的某些处理检查其状态更改。 The list will be manually populated and can vary in number to several thousand. 该列表将手动填充，并且数量可能会变化到数千个。

I'm trying to write a script that makes multiple simultaneous connections to the API to keep checking for the status change. 我正在尝试编写一个脚本，该脚本使多个同时连接到API，以继续检查状态更改。 For each item, once the status changes, the attempts to check must stop. 对于每一项，一旦状态改变，就必须停止检查尝试。 Based on reading other posts on Stackoverflow (Specifically, What is the fastest way to send 100,000 HTTP requests in Python? ), I've come up with the following code. 基于阅读Stackoverflow上的其他文章（具体地说，用Python发送100,000个HTTP请求的最快方法是什么？），我得出了以下代码。 But the script always stops after processing the list once. 但是脚本总是在处理完列表后停止。 What am I doing wrong? 我究竟做错了什么？

One additional issue that I'm facing is that the keyboard interrup method never fires (I'm trying with Ctrl+C but it does not kill the script. 我面临的另一个问题是，键盘插入方法永远不会触发（我正在尝试使用Ctrl + C，但是它不会终止脚本。

from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue

requestURLBase = "https://example.com/api"
apiKey = "123456"

concurrent = 200

keepTrying = 1

def doWork():
    while keepTrying == 1:
        url = q.get()
        status, body, url = checkStatus(url)
        checkResult(status, body, url)
        q.task_done()

def checkStatus(ourl):
    try:
        url = urlparse(ourl)
        conn = httplib.HTTPConnection(requestURLBase)
        conn.request("GET", url.path)
        res = conn.getresponse()
        respBody = res.read()
        conn.close()
        return res.status, respBody, ourl #Status can be 210 for error or 300 for successful API response
    except:
        print "ErrorBlock"
        print res.read()
        conn.close()
        return "error", "error", ourl

def checkResult(status, body, url):
    if "unavailable" not in body:
        print status, body, url
        keepTrying = 1
    else:
        keepTrying = 0

q = Queue(concurrent * 2)
for i in range(concurrent):
    t = Thread(target=doWork)
    t.daemon = True
    t.start()
try:
    for value in open('valuelist.txt'):
        fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
        print fullUrl
        q.put(fullUrl)
    q.join()
except KeyboardInterrupt:
    sys.exit(1)

I'm new to Python so there could be syntax errors as well... I'm definitely not familiar with multi-threading so perhaps I'm doing something else wrong as well. 我是Python的新手，所以可能还会出现语法错误...我绝对不熟悉多线程，所以也许我也在做其他错误。

Answer 1

In the code, the list is only read once. 在代码中，列表仅被读取一次。 Should be something like 应该是这样的

try:
    while True:
        for value in open('valuelist.txt'):
            fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
            print fullUrl
            q.put(fullUrl)
        q.join()

For the interrupt thing, remove the bare except line in checkStatus or make it except Exception . 对于中断事件，请除去checkStatus中的裸except行或使其成为except Exception except行。 Bare excepts will catch all exceptions, including SystemExit which is what sys.exit raises and stop the python process from terminating. 仅有的例外项将捕获所有异常，包括sys.exit引发的SystemExit并阻止python进程终止。

If I may make a couple comments in general though. 如果我可以发表一些一般性评论。

Threading is not a good implementation for such large concurrencies 对于如此大的并发，线程不是一个好的实现
Creating a new connection every time is not efficient 每次都创建新连接效率不高

What I would suggest is 我建议的是

Use gevent for asynchronous network I/O 使用gevent进行异步网络I / O
Pre-allocate a queue of connections same size as concurrency number and have checkStatus grab a connection object when it needs to make a call. 预分配与并发号大小相同的连接队列，并在需要进行调用时让checkStatus抓住连接对象。 That way the connections stay alive, get reused and there is no overhead in creating and destroying them and the increased memory use that goes with it. 这样，连接就可以保持活动状态，被重用，并且在创建和销毁它们以及增加随之而来的内存使用方面没有任何开销。

多个同时的HTTP请求

问题描述

1 个解决方案

解决方案1
0 2016-08-16 15:53:00

多个同时的HTTP请求

问题描述

1 个解决方案

解决方案1 0 2016-08-16 15:53:00

解决方案1
0 2016-08-16 15:53:00