Update: Problem was incomplete documentation, event dispatcher passing kwargs to the hook function.
I have a list of about 30k URLs that I want to check for various strings. I have a working version of this script using Requests & BeautifulSoup, but it doesn't use threading or asynchronous requests so it's incredibly slow.
Ultimately what I would like to do is cache the html for each URL so I can run multiple checks without making redundant HTTP requests to each site. If I have a function that will store the html, what's the best way to asynchronously send the HTTP GET requests and then pass the response objects?
I've been trying to use Grequests ( as described here ) and the "hooks" parameter, but I'm getting errors and the documentation doesn't go very in-depth . So I'm hoping someone with more experience can shed some light.
Here's a simplified example of what I'm trying to accomplish:
import grequests
urls = ['http://www.google.com/finance','http://finance.yahoo.com/','http://www.bloomberg.com/']
def print_url(r):
print r.url
def async(url_list):
sites = []
for u in url_list:
rs = grequests.get(u, hooks=dict(response=print_url))
sites.append(rs)
return grequests.map(sites)
print async(urls)
And it produces the following TypeError:
TypeError: print_url() got an unexpected keyword argument 'verify'
<Greenlet at 0x32803d8L: <bound method AsyncRequest.send of <grequests.AsyncRequest object at 0x00000000028D2160>>
(stream=False)> failed with TypeError
Not sure why it's sending 'verify' as a keyword argument by default; it would be great to get something working though, so if anyone has any suggestions (using grequests or otherwise) please share :)
Thanks in advance.
I tried your code and could get it work by adding an additional parameter kwargs to your print_url function.
def print_url(r, **kwargs):
print r.url
I figured what was wrong in this other stackoverlow question: Problems with hooks using Requests Python package .
It seems when you use the response hook in grequests you need to add **kwargs in your callback definition.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.