Python，Trio async function 根据需要

Question

在trio/anyio中，是否可以暂停任务直到我执行特定操作然后继续所有操作。

假设我运行特定的 function 以获取有效的 cookie，然后我开始抓取网站，但有时此 cookie 过期后，我需要再次运行之前的 function 以获取新的 cookie。

因此，如果在托儿所下生成 10 个任务，并且在此期间 cookie 过期而 6 个任务正在运行？ 那么我如何才能暂停所有这些并只运行一次这个 function 呢？

import trio
import httpx


async def get_cookies(client):
    # Let's say that here i will use a headless browser operation to obtain a valid cookie.
    pass


limiter = trio.CapacityLimiter(20)


async def crawler(client, url, sender):
    async with limiter, sender:
        r = await client.get(url)
        if "something special happen" in r.text:
            pass
            # here i want to say if my cookie got expired,
            # Then i want to run get_cookies() only one time .
        await sender.send(r.text)


async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        await get_cookies(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())


async def rec(receiver):
    async with receiver:
        for i in receiver:
            print(i)

if __name__ == "__main__":
    trio.run(main)

Answer 1

您只需将get_cookies包装在async with some_lock块中。 在该块中，如果您已经有一个 cookie（假设它是一个全局变量），则返回它，否则您获取一个然后设置全局变量。

当您注意到 cookie 已过期时，您将其删除（即将全局设置回None ）并调用get_cookies 。

换句话说，沿着这些方向：

class CrawlData: 
    def __init__(self, client):
        self.client = client
        self.valid = False
        self.lock = trio.Lock()
        self.limiter = trio.CapacityLimiter(20)
        
    async def get_cookie(self):       
        if self.valid:                                                                             
            return        
                      
        async with self.lock:        
            if self.valid:        
                return     
                
            ... # fetch cookie here, using self.client
            
            self.valid = True
                                   
    async def get(self, url):
        r = await self.client.get(url)
        if check_for_expired_cookie(r):
            await self.get_cookie()       
            r = await self.client.get(url)
            if check_for_expired_cookie(r):
                raise RuntimeError("New cookie doesn't work", r)
        return r
        

async def crawler(data, url, sender):
    async with data.limiter, sender: 
        r = await data.get(url)      
        await sender.send(r.text)
        
        
async def main():
    async with httpx.AsyncClient() as client, trio.open_nursery() as nurse:
        data = CrawlData(client)
        sender, receiver = trio.open_memory_channel(0)
        nurse.start_soon(rec, receiver)
        urls = []
        async with sender:
            for url in urls:
                nurse.start_soon(crawler, client, sender.clone())
        ...

Python，Trio async function 根据需要

问题描述

1 个解决方案

解决方案1
2 已采纳 2022-11-15 07:10:18

Python，Trio async function 根据需要

问题描述

1 个解决方案

解决方案1 2 已采纳 2022-11-15 07:10:18

解决方案1
2 已采纳 2022-11-15 07:10:18