简体   繁体   English

如何使用 requests_html 异步获取() URL 列表?

[英]How do I use requests_html to asynchronously get() a list of URLs?

I'm trying to asynchronously get() a list of URLs using python package resuqests_html , similar to the async example in the README using Python 3.6.5 and requests_html 0.10.0. I'm trying to asynchronously get() a list of URLs using python package resuqests_html , similar to the async example in the README using Python 3.6.5 and requests_html 0.10.0.

My understanding is that AsyncHTMLSession.run() is supposed work very much the same as asyncio.gather(): You give it a bunch of awaitables, and it runs all of them.我的理解是 AsyncHTMLSession.run() 应该与 asyncio.gather() 的工作方式非常相似:你给它一堆等待,它运行所有的。 Is that incorrect?这是不正确的吗?

Here's the code I'm trying, which I expect should get the pages and store the responses:这是我正在尝试的代码,我希望它应该获取页面并存储响应:

from requests_html import AsyncHTMLSession

async def get_link(url):
    r = await asession.get(url)
    return r

asession = AsyncHTMLSession()
results = asession.run(get_link("http://google.com"), get_link("http://yahoo.com"))

But I'm getting this exception instead:但我得到了这个例外:

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    results = asession.run(get_link("google.com"), get_link("yahoo.com"))
  File ".\venv\lib\site-packages\requests_html.py", line 772, in run
    asyncio.ensure_future(coro()) for coro in coros
  File ".\venv\lib\site-packages\requests_html.py", line 772, in <listcomp>
    asyncio.ensure_future(coro()) for coro in coros
TypeError: 'coroutine' object is not callable
sys:1: RuntimeWarning: coroutine 'get_link' was never awaited

Am I doing something wrong?难道我做错了什么?

Am I doing something wrong?难道我做错了什么?

You are not calling asession.run correctly.您没有正确调用asession.run

asyncio.gather accepts awaitable objects, such as coroutine objects obtained by just calling a coroutine (async) function. asyncio.gather接受可等待对象,例如通过调用协程(异步)function 获得的协程对象。 asession.run , on the other hand, accepts callables , such as async functions, which it will invoke to produce awaitables.另一方面, asession.run接受callables ,例如异步函数,它将调用这些函数来生成 awaitables。 The difference is like between one function that accepts an iterable, and which you could pass eg an instantiated generator, and another that accepts a callable that will return an iterable, and which you could pass a generator function itself.区别就像一个 function 接受一个可迭代的,你可以传递一个实例化的生成器,另一个接受一个可调用的返回一个迭代的,你可以传递一个生成器 function 本身。

Since your async functions have arguments, you cannot just pass get_link to asession.run ;由于您的异步函数具有 arguments,因此您不能只将get_link传递给asession.run you must use functools.partial or a lambda itself:您必须使用functools.partial或 lambda 本身:

results = asession.run(
    lambda: get_link("http://google.com"),
    lambda: get_link("http://yahoo.com"),
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM