[英]Async Python function call upon a Bash execution (through Python Popen)
[英]Python, Trio async function upon needs
在trio/anyio
中,是否可以暂停任务直到我执行特定操作然后继续所有操作。
假设我运行特定的 function 以获取有效的 cookie,然后我开始抓取网站,但有时此 cookie 过期后,我需要再次运行之前的 function 以获取新的 cookie。
因此,如果在托儿所下生成 10 个任务,并且在此期间 cookie 过期而 6 个任务正在运行? 那么我如何才能暂停所有这些并只运行一次这个 function 呢?
import trio
import httpx
async def get_cookies(client):
# Let's say that here i will use a headless browser operation to obtain a valid cookie.
pass
limiter = trio.CapacityLimiter(20)
async def crawler(client, url, sender):
async with limiter, sender:
r = await client.get(url)
if "something special happen" in r.text:
pass
# here i want to say if my cookie got expired,
# Then i want to run get_cookies() only one time .
await sender.send(r.text)
async def main():
async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
await get_cookies(client)
sender, receiver = trio.open_memory_channel(0)
nurse.start_soon(rec, receiver)
urls = []
async with sender:
for url in urls:
nurse.start_soon(crawler, client, sender.clone())
async def rec(receiver):
async with receiver:
for i in receiver:
print(i)
if __name__ == "__main__":
trio.run(main)
您只需将get_cookies
包装在async with some_lock
块中。 在该块中,如果您已经有一个 cookie(假设它是一个全局变量),则返回它,否则您获取一个然后设置全局变量。
当您注意到 cookie 已过期时,您将其删除(即将全局设置回None
)并调用get_cookies
。
换句话说,沿着这些方向:
class CrawlData:
def __init__(self, client):
self.client = client
self.valid = False
self.lock = trio.Lock()
self.limiter = trio.CapacityLimiter(20)
async def get_cookie(self):
if self.valid:
return
async with self.lock:
if self.valid:
return
... # fetch cookie here, using self.client
self.valid = True
async def get(self, url):
r = await self.client.get(url)
if check_for_expired_cookie(r):
await self.get_cookie()
r = await self.client.get(url)
if check_for_expired_cookie(r):
raise RuntimeError("New cookie doesn't work", r)
return r
async def crawler(data, url, sender):
async with data.limiter, sender:
r = await data.get(url)
await sender.send(r.text)
async def main():
async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
data = CrawlData(client)
sender, receiver = trio.open_memory_channel(0)
nurse.start_soon(rec, receiver)
urls = []
async with sender:
for url in urls:
nurse.start_soon(crawler, client, sender.clone())
...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.