如何在大管理器中同时运行异步启动器/同时运行异步功能

Question

I am trying to make my code run faster for finding roblox account names.我试图让我的代码运行得更快以查找 roblox 帐户名称。 I tried using larger and larger event loops (they basically took the previous event manager and used that to make a larger event manager), but that resulted in the same, if not worse performance when compared to using just a single small event loop.我尝试使用越来越大的事件循环（他们基本上采用了以前的事件管理器并用它来制作更大的事件管理器），但是与仅使用单个小事件循环相比，这导致了相同的性能，如果不是更差的话。

This code was supplied in another question of mine (with modifications from me here).这段代码是在我的另一个问题中提供的（这里有我的修改）。 It works great, but it still can take a good few minutes to handle larger quantities of accounts.它工作得很好，但处理大量帐户仍然需要几分钟的时间。 Usually I wouldn't care, but I am trying to get to 100,000 accounts, so I need performance.通常我不会在意，但我想达到 100,000 个帐户，所以我需要性能。 Is this just how fast it can go?这就是它的速度吗？ Or can we drive this even further?或者我们可以进一步推动这一点吗？ Is the answer just more CPU/memory?答案只是更多的 CPU/内存吗？ Better internet?更好的互联网？ Do I need network programming at all, or is there a faster, no-request way?我是否需要网络编程，或者有没有更快的无请求方式？

Code:代码：

import asyncio
import aiohttp


async def find_account(url, session, id):
    try:
        async with session.get(url) as response:
            if response.status == 200:
                r = await response.read()
                from bs4 import BeautifulSoup
                soup = BeautifulSoup(r, 'html.parser')
                h2 = []
                for i in soup.find_all('h2'):
                    h2.append(i)
                print('Done')
                return str(list(list(h2)[0])[0]) + '  ' + str(url)
            else:
                return 'This account does not exist ID: {}'.format(id)
    except aiohttp.ServerDisconnectedError:
        print('Done')
        return find_account(url, session, id)


async def main(min_id, max_id):
    tasks = []
    async with aiohttp.ClientSession() as session:
        for id in range(min_id, max_id):
            url = f'https://web.roblox.com/users/{str(id)}/profile'
            tasks.append(asyncio.create_task(find_account(url=url, session=session, id=id)))

        return await asyncio.gather(*tasks)


from time import time
loop = asyncio.get_event_loop()
starting = int(input("Type Your Starting Id Number>> "))
ending = int(input("Type Your Ending Id Number>> "))
timer = time()
users = loop.run_until_complete(main(starting, ending))
users = [i for i in users if i != '1']
print(users)
print(time()-timer)

Answer 1

You could run BeautifulSoup in multiple processes to speed it up.您可以在多个进程中运行BeautifulSoup以加快速度。 For example, you can extract the part of find_account that does the parsing and pass that to a process pool executor:例如，您可以提取执行解析的find_account部分find_account其传递给进程池执行程序：

import concurrent.futures
_pool = concurrent.futures.ProcessPoolExecutor()

def parse(html):
    from bs4 import BeautifulSoup
    soup = BeautifulSoup(html, 'html.parser')
    h2 = []
    for i in soup.find_all('h2'):
        h2.append(i)
    return str(list(list(h2)[0])[0])

async def find_account(url, session, id):
    while True:
        async with session.get(url) as response:
            if response.status == 200:
                r = await response.read()
                loop = asyncio.get_event_loop()
                extracted = await loop.run_in_executor(_pool, parse, r)
                print('Done')
                return extracted + '  ' + str(url)
            else:
                return 'This account does not exist ID: {}'.format(id)
    except aiohttp.ServerDisconnectedError:
        print('Done')
        # keep looping

On an unrelated note, your recursive call to find_account() was incorrect because it was missing an await .在一个不相关的注释中，您对find_account()递归调用不正确，因为它缺少await 。 The above code fixes that and switches to a loop instead, which makes it a bit more explicit that the code is in fact looping.上面的代码修复了这个问题并切换到循环，这使得代码实际上是循环的更加明确。

如何在大管理器中同时运行异步启动器/同时运行异步功能

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-11-17 09:34:55

如何在大管理器中同时运行异步启动器/同时运行异步功能

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-11-17 09:34:55

解决方案1
1 已采纳 2020-11-17 09:34:55