簡體   English   中英

如何更改此代碼以使用上下文管理器?

[英]How can I change this code to use context managers?

我正在嘗試使用aiohttpasyncio多個憑據同時登錄網站。 create_tasks函數中,我生成一個要用於每個會話的會話列表。 我不能在login功能中創建一個sesssion的原因是因為在整個代碼中將使用相同的會話對象。 我正在嘗試做的是設計一種方法,我可以使用上下文管理器來處理會話的關閉(以避免運行時錯誤使其保持打開狀態)。

以下代碼按預期工作(並發收集登錄頁面並在進程池中解析令牌),但它會與任務分開生成會話,並要求我在最后關閉它們。

from bs4 import BeautifulSoup
from concurrent.futures import ProcessPoolExecutor
import aiohttp
import asyncio

#TODO: make this safe, handle exceptions

LOGIN_URL = "http://example.com/login"
CLIENT_CNT = 10
proc_pool = ProcessPoolExecutor(CLIENT_CNT)

def get_key(text):
    soup = BeautifulSoup(text, "html.parser")
    form = soup.find("form")
    key = form.find("input", attrs={"type": "hidden", "name": "authenticityToken"})
    return key.get("value", None)

async def login(username:str, password:str, session:aiohttp.ClientSession, sem:asyncio.BoundedSemaphore, loop:asyncio.AbstractEventLoop=None):
    loop = loop or asyncio.get_event_loop()
    async with sem:
        async with session.get(LOGIN_URL) as resp:
            x = await asyncio.ensure_future(loop.run_in_executor(proc_pool, get_key, await resp.text()))
            print(x)

def create_tasks(usernames, passwords, sem:asyncio.BoundedSemaphore, loop:asyncio.AbstractEventLoop=None):
    loop = loop or asyncio.get_event_loop()
    tasks = []
    sessions = []
    for u, p in zip(usernames, passwords):
        session = aiohttp.ClientSession(loop=loop)
        sessions.append(session)
        tasks.append(login(u, p, session, sem, loop))
    return tasks, sessions

if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    sem = asyncio.BoundedSemaphore(CLIENT_CNT)
    usernames = ("a", "b", "c", "d", "e", "f", "g")
    passwords = ("a", "b", "c", "d", "e", "f", "g")
    tasks, sessions = create_tasks(usernames, passwords, sem, loop)
    loop.run_until_complete(asyncio.gather(*tasks, loop=loop))
    for session in sessions:
        session.close()

我之前使create_tasks成為一個協程,寫了一個包裝類來制作異步迭代,並嘗試使用

async with aiohttp.ClientSession() as session:
    tasks.append(login(u, p, session, sem, loop)

但正如我所擔心的,它表示會議在運行時已經關閉。

這是一種使推理更容易的結構:

async def user(u, p, ...):
    """Everything a single user does"""
    auth = await login(u, p)
    await download_something(auth, ...)
    await post_something(auth, ...)

async def login(u, p): ...
    async with aiohttp.ClientSession() as session:
        async with session.get("http://xxx/login", ...) as r:
            data = await r.json()
            return data["something"]

async def download_xxx(...): ...
async def post_xxx(...): ...

async def everything():
    creds = [("u1", "p1"), ...] 
    flows = [asyncio.ensure_future(user(cred)) for cred in creds]
    for flow in flows:
        await flow

警告程序員:默認情況下, aiohttp出現存儲cookie,確保它不會對您的用戶流進行異花授粉。

獎勵積分:在最后一個異步函數中正確使用asyncio.gather()

使用ExitStack

from contextlib import ExitStack

def create_tasks(..., context):
    tasks = []
    for username in usernames:
        session = aiohttp.ClientSession()
        tasks.append(...)
        context.enter_context(session)
    return tasks

if __name__ == "__main__":
    context = ExitStack()
    tasks = create_tasks(..., context)
    with context:
        loop.run_until_complete(asyncio.gather(*tasks))

你沒有真正解釋你需要什么樣的任務,一個簡單的獲取?

更復雜的東西?

您是否希望每個用戶名/密碼具體?

你需要最后保存所有回復嗎?

對於此代碼,我假設用戶名/密碼無關緊要,但它可以快速更改。

而不是你如何分別啟動會話我使用了消費者/生產者模式。

每個消費者與上下文管理器會話,也不需要信號量(因為隊列)。

import asyncio
from concurrent.futures import ProcessPoolExecutor

from aiohttp import ClientSession
from bs4 import BeautifulSoup

LOGIN_URL = "http://example.com/login"
CLIENT_CNT = 10
proc_pool = ProcessPoolExecutor(CLIENT_CNT)


def get_key(text):
    soup = BeautifulSoup(text, "html.parser")
    form = soup.find("form")
    key = form.find("input", attrs={"type": "hidden", "name": "authenticityToken"})
    return key.get("value", None)


async def init_consumer(username: str, password: str, loop, queue):
    loop = loop or asyncio.get_event_loop()
    async with ClientSession(loop=loop) as session:
        # init the session with creds? i you didn't use the username/password
        async with session.get(LOGIN_URL) as login_resp:
            x = await asyncio.ensure_future(loop.run_in_executor(proc_pool, get_key, await login_resp.text()))
            print(x)
        url = await queue.get()
        while url is not None:
            # Do things with session and queue
            async with session.get(url) as resp:
                rsp_as_txt = await resp.text()
            queue.task_done()
            url = await queue.get()


async def generate_tasks(queue):
    tasks = ["http://www.example.com" for i in range(20)]
    # putting all tasks in queue
    for task in tasks:
        await queue.put(task)
    # waiting for all tasks to finish
    queue.join()
    # Telling consumer to finish process
    for i in range(queue.maxsize):
        queue.put(None)


async def run(loop):
    queue = asyncio.Queue(CLIENT_CNT)
    usernames = ("a", "b", "c", "d", "e", "f", "g")
    passwords = ("a", "b", "c", "d", "e", "f", "g")
    consumers = [asyncio.ensure_future(init_consumer(u, p, loop, queue)) for u, p in zip(usernames, passwords)]
    return await generate_tasks(queue)


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(run(loop=loop))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM