I am trying to query an API for some search results. I first get the number of results my search has returned, and then spawn number of threads equal to the number of result pages I have. However, when the number of pages gets higher, I start getting inconsistent HTTP error 401's from urllib2, even though I am using the same API key for all urls I generate. The errors occur on different urls every time. First of all, is this the best way to query an API for information that spans multiple pages (more than a thousand). Secondly, why am I getting the bug?
def worker(pageNum):
pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
pageDetails = json.loads(pageDetails)
#print pageDetails
print str(pageNum) + "\n"
return
def parallelRun(totalPages):
pageList = range(totalPages)
threads = []
for pageNum in pageList:
t = threading.Thread(target=worker, args=(pageNum,))
threads.append(t)
for thread in threads:
thread.start()
for thread in threads:
thread.join()
return
parallelRun(numPages)
If you change your worker to something like this:
def worker(pageNum):
try:
pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
pageDetails = json.loads(pageDetails)
#print pageDetails
print str(pageNum) + "\n"
return
except urlib2.HTTPError as err:
print err.reason
print err.read()
raise
you will get more detailed information about what is going wrong.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.