I am trying to make my code run faster for finding roblox account names. I tried using larger and larger event loops (they basically took the previous event manager and used that to make a larger event manager), but that resulted in the same, if not worse performance when compared to using just a single small event loop.
This code was supplied in another question of mine (with modifications from me here). It works great, but it still can take a good few minutes to handle larger quantities of accounts. Usually I wouldn't care, but I am trying to get to 100,000 accounts, so I need performance. Is this just how fast it can go? Or can we drive this even further? Is the answer just more CPU/memory? Better internet? Do I need network programming at all, or is there a faster, no-request way?
Code:
import asyncio
import aiohttp
async def find_account(url, session, id):
try:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
from bs4 import BeautifulSoup
soup = BeautifulSoup(r, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
print('Done')
return str(list(list(h2)[0])[0]) + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
return find_account(url, session, id)
async def main(min_id, max_id):
tasks = []
async with aiohttp.ClientSession() as session:
for id in range(min_id, max_id):
url = f'https://web.roblox.com/users/{str(id)}/profile'
tasks.append(asyncio.create_task(find_account(url=url, session=session, id=id)))
return await asyncio.gather(*tasks)
from time import time
loop = asyncio.get_event_loop()
starting = int(input("Type Your Starting Id Number>> "))
ending = int(input("Type Your Ending Id Number>> "))
timer = time()
users = loop.run_until_complete(main(starting, ending))
users = [i for i in users if i != '1']
print(users)
print(time()-timer)
You could run BeautifulSoup
in multiple processes to speed it up. For example, you can extract the part of find_account
that does the parsing and pass that to a process pool executor:
import concurrent.futures
_pool = concurrent.futures.ProcessPoolExecutor()
def parse(html):
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
h2 = []
for i in soup.find_all('h2'):
h2.append(i)
return str(list(list(h2)[0])[0])
async def find_account(url, session, id):
while True:
async with session.get(url) as response:
if response.status == 200:
r = await response.read()
loop = asyncio.get_event_loop()
extracted = await loop.run_in_executor(_pool, parse, r)
print('Done')
return extracted + ' ' + str(url)
else:
return 'This account does not exist ID: {}'.format(id)
except aiohttp.ServerDisconnectedError:
print('Done')
# keep looping
On an unrelated note, your recursive call to find_account()
was incorrect because it was missing an await
. The above code fixes that and switches to a loop instead, which makes it a bit more explicit that the code is in fact looping.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.