[英]Multiple simultaneous HTTP requests
I'm trying to take a list of items and check for their status change based on certain processing by the API. 我正在尝试获取项目列表,并根据API的某些处理检查其状态更改。 The list will be manually populated and can vary in number to several thousand.
该列表将手动填充,并且数量可能会变化到数千个。
I'm trying to write a script that makes multiple simultaneous connections to the API to keep checking for the status change. 我正在尝试编写一个脚本,该脚本使多个同时连接到API,以继续检查状态更改。 For each item, once the status changes, the attempts to check must stop.
对于每一项,一旦状态改变,就必须停止检查尝试。 Based on reading other posts on Stackoverflow (Specifically, What is the fastest way to send 100,000 HTTP requests in Python? ), I've come up with the following code.
基于阅读Stackoverflow上的其他文章(具体地说, 用Python发送100,000个HTTP请求的最快方法是什么? ),我得出了以下代码。 But the script always stops after processing the list once.
但是脚本总是在处理完列表后停止。 What am I doing wrong?
我究竟做错了什么?
One additional issue that I'm facing is that the keyboard interrup method never fires (I'm trying with Ctrl+C but it does not kill the script. 我面临的另一个问题是,键盘插入方法永远不会触发(我正在尝试使用Ctrl + C,但是它不会终止脚本。
from urlparse import urlparse
from threading import Thread
import httplib, sys
from Queue import Queue
requestURLBase = "https://example.com/api"
apiKey = "123456"
concurrent = 200
keepTrying = 1
def doWork():
while keepTrying == 1:
url = q.get()
status, body, url = checkStatus(url)
checkResult(status, body, url)
q.task_done()
def checkStatus(ourl):
try:
url = urlparse(ourl)
conn = httplib.HTTPConnection(requestURLBase)
conn.request("GET", url.path)
res = conn.getresponse()
respBody = res.read()
conn.close()
return res.status, respBody, ourl #Status can be 210 for error or 300 for successful API response
except:
print "ErrorBlock"
print res.read()
conn.close()
return "error", "error", ourl
def checkResult(status, body, url):
if "unavailable" not in body:
print status, body, url
keepTrying = 1
else:
keepTrying = 0
q = Queue(concurrent * 2)
for i in range(concurrent):
t = Thread(target=doWork)
t.daemon = True
t.start()
try:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
except KeyboardInterrupt:
sys.exit(1)
I'm new to Python so there could be syntax errors as well... I'm definitely not familiar with multi-threading so perhaps I'm doing something else wrong as well. 我是Python的新手,所以可能还会出现语法错误...我绝对不熟悉多线程,所以也许我也在做其他错误。
In the code, the list is only read once. 在代码中,列表仅被读取一次。 Should be something like
应该是这样的
try:
while True:
for value in open('valuelist.txt'):
fullUrl = requestURLBase + "?key=" + apiKey + "&value=" + value.strip() + "&years="
print fullUrl
q.put(fullUrl)
q.join()
For the interrupt thing, remove the bare except
line in checkStatus or make it except Exception
. 对于中断事件,请除去checkStatus中的裸
except
行或使其成为except Exception
except
行。 Bare excepts will catch all exceptions, including SystemExit
which is what sys.exit
raises and stop the python process from terminating. 仅有的例外项将捕获所有异常,包括
sys.exit
引发的SystemExit
并阻止python进程终止。
If I may make a couple comments in general though. 如果我可以发表一些一般性评论。
What I would suggest is 我建议的是
checkStatus
grab a connection object when it needs to make a call. checkStatus
抓住连接对象。 That way the connections stay alive, get reused and there is no overhead in creating and destroying them and the increased memory use that goes with it.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.