繁体   English   中英

Python,Trio async function 根据需要

[英]Python, Trio async function upon needs

trio/anyio中,是否可以暂停任务直到我执行特定操作然后继续所有操作。

假设我运行特定的 function 以获取有效的 cookie,然后我开始抓取网站,但有时此 cookie 过期后,我需要再次运行之前的 function 以获取新的 cookie。

因此,如果在托儿所下生成 10 个任务,并且在此期间 cookie 过期而 6 个任务正在运行? 那么我如何才能暂停所有这些并只运行一次这个 function 呢?

import trio
import httpx


async def get_cookies(client):
    # Let's say that here i will use a headless browser operation to obtain a valid cookie.
    pass


limiter = trio.CapacityLimiter(20)


async def crawler(client, url, sender):
    async with limiter, sender:
        r = await client.get(url)
        if "something special happen" in r.text:
            pass
            # here i want to say if my cookie got expired,
            # Then i want to run get_cookies() only one time .
        await sender.send(r.text)


async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        await get_cookies(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())


async def rec(receiver):
    async with receiver:
        for i in receiver:
            print(i)

if __name__ == "__main__":
    trio.run(main)

您只需将get_cookies包装在async with some_lock块中。 在该块中,如果您已经有一个 cookie(假设它是一个全局变量),则返回它,否则您获取一个然后设置全局变量。

当您注意到 cookie 已过期时,您将其删除(即将全局设置回None )并调用get_cookies

换句话说,沿着这些方向:

class CrawlData: 
    def __init__(self, client):
        self.client = client
        self.valid = False
        self.lock = trio.Lock()
        self.limiter = trio.CapacityLimiter(20)
        
    async def get_cookie(self):       
        if self.valid:                                                                             
            return        
                      
        async with self.lock:        
            if self.valid:        
                return     
                
            ... # fetch cookie here, using self.client
            
            self.valid = True
                                   
    async def get(self, url):
        r = await self.client.get(url)
        if check_for_expired_cookie(r):
            await self.get_cookie()       
            r = await self.client.get(url)
            if check_for_expired_cookie(r):
                raise RuntimeError("New cookie doesn't work", r)
        return r
        

async def crawler(data, url, sender):
    async with data.limiter, sender: 
        r = await data.get(url)      
        await sender.send(r.text)
        
        
async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        data = CrawlData(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())
        ...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM