简体   繁体   中英

Multi-threading GET requests with python: error 401

I am trying to query an API for some search results. I first get the number of results my search has returned, and then spawn number of threads equal to the number of result pages I have. However, when the number of pages gets higher, I start getting inconsistent HTTP error 401's from urllib2, even though I am using the same API key for all urls I generate. The errors occur on different urls every time. First of all, is this the best way to query an API for information that spans multiple pages (more than a thousand). Secondly, why am I getting the bug?

def worker(pageNum):
    pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
    pageDetails = json.loads(pageDetails)
    #print pageDetails
    print str(pageNum) + "\n"
    return

def parallelRun(totalPages):
    pageList = range(totalPages)
    threads = []
    for pageNum in pageList:
        t = threading.Thread(target=worker, args=(pageNum,))
        threads.append(t)

    for thread in threads:
        thread.start()

    for thread in threads:
        thread.join()
    return

parallelRun(numPages)

If you change your worker to something like this:

def worker(pageNum):
    try:
       pageDetails = urllib2.urlopen(generateUrl(pageNum), timeout=1000).read()
       pageDetails = json.loads(pageDetails)
       #print pageDetails
       print str(pageNum) + "\n"
       return
    except urlib2.HTTPError as err:
       print err.reason
       print err.read()
       raise

you will get more detailed information about what is going wrong.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM