简体   繁体   中英

How to make asynchronous HTTP GET requests in Python and pass response object to a function

Update: Problem was incomplete documentation, event dispatcher passing kwargs to the hook function.

I have a list of about 30k URLs that I want to check for various strings. I have a working version of this script using Requests & BeautifulSoup, but it doesn't use threading or asynchronous requests so it's incredibly slow.

Ultimately what I would like to do is cache the html for each URL so I can run multiple checks without making redundant HTTP requests to each site. If I have a function that will store the html, what's the best way to asynchronously send the HTTP GET requests and then pass the response objects?

I've been trying to use Grequests ( as described here ) and the "hooks" parameter, but I'm getting errors and the documentation doesn't go very in-depth . So I'm hoping someone with more experience can shed some light.

Here's a simplified example of what I'm trying to accomplish:

import grequests

urls = ['http://www.google.com/finance','http://finance.yahoo.com/','http://www.bloomberg.com/']

def print_url(r):
    print r.url

def async(url_list):
    sites = []
    for u in url_list:
        rs = grequests.get(u, hooks=dict(response=print_url))
        sites.append(rs)
    return grequests.map(sites)

print async(urls)

And it produces the following TypeError:

TypeError: print_url() got an unexpected keyword argument 'verify'
<Greenlet at 0x32803d8L: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x00000000028D2160>>
(stream=False)> failed with TypeError

Not sure why it's sending 'verify' as a keyword argument by default; it would be great to get something working though, so if anyone has any suggestions (using grequests or otherwise) please share :)

Thanks in advance.

I tried your code and could get it work by adding an additional parameter kwargs to your print_url function.

def print_url(r, **kwargs):
    print r.url

I figured what was wrong in this other stackoverlow question: Problems with hooks using Requests Python package .

It seems when you use the response hook in grequests you need to add **kwargs in your callback definition.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM