Multi-threading GET requests with python: error 401

Question

I am trying to query an API for some search results. I first get the number of results my search has returned, and then spawn number of threads equal to the number of result pages I have. However, when the number of pages gets higher, I start getting inconsistent HTTP error 401's from urllib2, even though I am using the same API key for all urls I generate. The errors occur on different urls every time. First of all, is this the best way to query an API for information that spans multiple pages (more than a thousand). Secondly, why am I getting the bug?

def worker(pageNum):
    pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
    pageDetails = json.loads(pageDetails)
    #print pageDetails
    print str(pageNum) + "\n"
    return

def parallelRun(totalPages):
    pageList = range(totalPages)
    threads = []
    for pageNum in pageList:
        t = threading.Thread(target=worker, args=(pageNum,))
        threads.append(t)

    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()
    return

parallelRun(numPages)

Answer 1

If you change your worker to something like this:

def worker(pageNum):
    try:
       pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
       pageDetails = json.loads(pageDetails)
       #print pageDetails
       print str(pageNum) + "\n"
       return
    except urlib2.HTTPError as err:
       print err.reason
       print err.read()
       raise

you will get more detailed information about what is going wrong.

Multi-threading GET requests with python: error 401

Question

1 answers

solution1
0 2014-03-06 15:54:57

Multi-threading GET requests with python: error 401

Question

1 answers

solution1 0 2014-03-06 15:54:57

solution1
0 2014-03-06 15:54:57