简体   繁体   English

Python Trio 设置十进制数的工人

[英]Python Trio set up a decimal number of workers

I'm working with trio to run asynchronous concurrent task that will do some web scraping on different websites.我正在与 trio 合作运行异步并发任务,该任务将在不同的网站上进行一些网络抓取。 I'd like to be able to chose how many concurrent workers I'll divide the tasks with.我希望能够选择我将与多少并发工作人员一起分配任务。 To do so I've written this code为此,我编写了这段代码

async def run_task():
    s = trio.Session(connections=5)
    Total_to_check = to_check() / int(module().workers)
    line = 0
    if int(Total_to_check) < 1:
        Total_to_check = 1
        module().workers = int(to_check())
    for i in range(int(Total_to_check)):
        try:
            async with trio.open_nursery() as nursery:
                for x in range(int(module().workers)):                  
                        nursery.start_soon(python_worker, self, s, x, line)
                        line += 1
                            
    
        except BlockingIOError as e:
            print("[Fatal Error]", str(e))
            continue            

In this example to_check() is equal to how many urls are given to fetch data from, and module().workers is equal to how many concurrent workers I'd like to use.在这个例子中, to_check()等于提供了多少个 url 来从中获取数据,而module().workers等于我想要使用的并发工作人员的数量。

So if I had let's say I had 30 urls and I input that I want 10 concurrent tasks, it'll fetch data from 10 urls concurrently and repeat the procedure 3 times.因此,如果我假设我有 30 个 url,并且我输入我想要 10 个并发任务,它将同时从 10 个 url 中获取数据并重复该过程 3 次。

Now this is all well and good up until I the Total_to_check (which is equal to the number of urls divided by the number of workers) is in the decimals.现在这一切都很好,直到我的Total_to_check (等于 url 的数量除以工作人员的数量)是小数。 If I have let's say 15 urls and I ask for 10 workers, then this code will only check 10 urls.如果我假设有 15 个 url 并且我要求 10 个工人,那么此代码将只检查 10 个 url。 Same if I've got 20 urls but ask for 15 workers.如果我有 20 个网址,但要求 15 个工人,也一样。 I could do something like math.ceil(Total_to_check) but then it'll start trying to check urls that don't exist.我可以做一些类似 math.ceil(Total_to_check) 的事情,但它会开始尝试检查不存在的 url。

How could I make this properly work, so that let's if I have 10 concurrent tasks and 15 urls, it'll check the first 10 concurrently and then the last 5 concurrently without skipping urls?我怎样才能让它正常工作,这样如果我有 10 个并发任务和 15 个 url,它会同时检查前 10 个,然后同时检查最后 5 个而不跳过 url? (or trying to check too many) (或试图检查太多)

Thanks!谢谢!

Well, here comes the CapacityLimiter that you would use like this:好吧,这是您可以像这样使用的CapacityLimiter

async def python_worker(self, session, workers, line, limit):
    async with limit:
        ...

Then you can simplify your run_task :然后你可以简化你的run_task

async def run_task():
    limit = trio.CapacityLimiter(10)
    s = trio.Session(connections=5)
    line = 0
    async with trio.open_nursery() as nursery:
        for x in range(int(to_check())):
            nursery.start_soon(python_worker, self, s, x, line, limit)
            line += 1      

I believe the BlockingIOError would have to move inside python_worker too because nursery.start_soon() won't block, it's the __aexit__ of the nursery that automagically waits at the end of the async with trio.open_nursery() as nursery block.我相信BlockingIOError也必须在python_worker内部移动,因为nursery.start_soon()不会阻塞,它是nursery__aexit__async with trio.open_nursery() as nursery结束时自动等待, async with trio.open_nursery() as nursery块。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM