簡體   English   中英

Python,並發和異步:添加旋轉代理時出現問題

[英]Python, Concurrency and asyncio: Problem adding a rotating proxy

我正在使用 asyncio 創建一個優化的多線程應用程序,並希望在組合中添加一個旋轉代理。

從這篇優秀文章中的一個樣本開始:

通過並發加速您的 Python 程序

我添加了一個旋轉代理,它停止工作。 代碼在觸摸代理行后簡單地退出 function。

在此處輸入圖像描述

這個小代碼片段有效,但在添加到主腳本時無效,如上面的屏幕截圖所示。

import asyncio
import random as rnd
 
async def download_site():
    proxy_list = [
        ('38.39.205.220:80'),
        ('38.39.204.100:80'),
        ('38.39.204.101:80'),
        ('38.39.204.94:80')
        ]
    await asyncio.sleep(1)
    proxy = rnd.choice(proxy_list)
    print(proxy)
 
asyncio.run(download_site())

這是完整的示例:

import asyncio
import time
import aiohttp

# Sample code taken from here:
# https://realpython.com/python-concurrency/#asyncio-version

# Info for adding headers for the proxy (Scroll toward the bottom)
# https://docs.aiohttp.org/en/stable/client_advanced.html

# Good read to possible improve performance on large lists of URLs
# https://asyncio.readthedocs.io/en/latest/webscraper.html


# RUN THIS METHOD TO SEE HOW IT WORKS.
# # Original Code (working...)  
# async def download_site(session, url):
#     async with session.get(url, proxy="http://proxy.com") as response:
#         print("Read {0} from {1}".format(response.content_length, url))

def get_proxy(self):
    proxy_list = [
    (754, '38.39.205.220:80'),
    (681, '38.39.204.100:80'),
    (682, '38.39.204.101:80'),
    (678, '38.39.204.94:80')
    ]
    proxy = random.choice(proxy_list)
    print(proxy[1])
    return proxy


async def download_site(session, url):
    proxy_list = [
        ('38.39.205.220:80'),
        ('38.39.204.100:80'),
        ('38.39.204.101:80'),
        ('38.39.204.94:80')
        ]
    await asyncio.sleep(1)
    proxy = rnd.choice(proxy_list)
    print(proxy)
    async with session.get(url, proxy="http://" + proxy) as response:
        print("Read {0} from {1}".format(response.content_length, url))


async def download_all_sites(sites):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            task = asyncio.ensure_future(download_site(session, url))
            tasks.append(task)
        await asyncio.gather(*tasks, return_exceptions=True)


# Modified to loop thru only 1 URL to make debugging simple
if __name__ == "__main__":
    sites = [
        "https://www.jython.org",
       # "http://olympus.realpython.org/dice",
    ] #* 80
    start_time = time.time()
    asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
    duration = time.time() - start_time
    print(f"Downloaded {len(sites)} sites in {duration} seconds")

感謝您提供的任何幫助。

您使用return_exceptions=True但實際上並沒有檢查返回的結果是否有錯誤。 您可以使用asyncio.as_completed來處理異常並獲得最早的下一個結果:

import asyncio
import random
import traceback

import aiohttp


URLS = ("https://stackoverflow.com",)
TIMEOUT = 5
PROXIES = (
    "http://38.39.205.220:80",
    "http://38.39.204.100:80",
    "http://38.39.204.101:80",
    "http://38.39.204.94:80",
)


def get_proxy():
    return random.choice(PROXIES)


async def download_site(session, url):
    proxy = get_proxy()

    print(f"Got proxy: {proxy}")

    async with session.get(url, proxy=f"{proxy}", timeout=TIMEOUT) as resp:
        print(f"{url}: {resp.status}")
        return await resp.text()


async def main():
    tasks = []

    async with aiohttp.ClientSession() as session:
        for url in URLS:
            tasks.append(asyncio.create_task(download_site(session, url)))

        for coro in asyncio.as_completed(tasks):
            try:
                html = await coro
            except Exception:
                traceback.print_exc()
            else:
                print(len(html))


if __name__ == "__main__":
    asyncio.run(main())

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM