简体   繁体   English

Python,并发和异步:添加旋转代理时出现问题

[英]Python, Concurrency and asyncio: Problem adding a rotating proxy

I'm creating an optimized multi-threading app using asyncio and want to add a rotating proxy into the mix.我正在使用 asyncio 创建一个优化的多线程应用程序,并希望在组合中添加一个旋转代理。

Starting with a sample taken from this outstanding article:从这篇优秀文章中的一个样本开始:

Speed Up Your Python Program With Concurrency 通过并发加速您的 Python 程序

I added a rotating proxy and it stopped working.我添加了一个旋转代理,它停止工作。 The code simply exits the function after touching the line for the proxy.代码在触摸代理行后简单地退出 function。

在此处输入图像描述

This little snippet of code works, but not when added to the main script as shown in the screenshot above.这个小代码片段有效,但在添加到主脚本时无效,如上面的屏幕截图所示。

import asyncio
import random as rnd
 
async def download_site():
    proxy_list = [
        ('38.39.205.220:80'),
        ('38.39.204.100:80'),
        ('38.39.204.101:80'),
        ('38.39.204.94:80')
        ]
    await asyncio.sleep(1)
    proxy = rnd.choice(proxy_list)
    print(proxy)
 
asyncio.run(download_site())

And here's the full sample:这是完整的示例:

import asyncio
import time
import aiohttp

# Sample code taken from here:
# https://realpython.com/python-concurrency/#asyncio-version

# Info for adding headers for the proxy (Scroll toward the bottom)
# https://docs.aiohttp.org/en/stable/client_advanced.html

# Good read to possible improve performance on large lists of URLs
# https://asyncio.readthedocs.io/en/latest/webscraper.html


# RUN THIS METHOD TO SEE HOW IT WORKS.
# # Original Code (working...)  
# async def download_site(session, url):
#     async with session.get(url, proxy="http://proxy.com") as response:
#         print("Read {0} from {1}".format(response.content_length, url))

def get_proxy(self):
    proxy_list = [
    (754, '38.39.205.220:80'),
    (681, '38.39.204.100:80'),
    (682, '38.39.204.101:80'),
    (678, '38.39.204.94:80')
    ]
    proxy = random.choice(proxy_list)
    print(proxy[1])
    return proxy


async def download_site(session, url):
    proxy_list = [
        ('38.39.205.220:80'),
        ('38.39.204.100:80'),
        ('38.39.204.101:80'),
        ('38.39.204.94:80')
        ]
    await asyncio.sleep(1)
    proxy = rnd.choice(proxy_list)
    print(proxy)
    async with session.get(url, proxy="http://" + proxy) as response:
        print("Read {0} from {1}".format(response.content_length, url))


async def download_all_sites(sites):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            task = asyncio.ensure_future(download_site(session, url))
            tasks.append(task)
        await asyncio.gather(*tasks, return_exceptions=True)


# Modified to loop thru only 1 URL to make debugging simple
if __name__ == "__main__":
    sites = [
        "https://www.jython.org",
       # "http://olympus.realpython.org/dice",
    ] #* 80
    start_time = time.time()
    asyncio.get_event_loop().run_until_complete(download_all_sites(sites))
    duration = time.time() - start_time
    print(f"Downloaded {len(sites)} sites in {duration} seconds")

Thank you for any help you can offer.感谢您提供的任何帮助。

You use return_exceptions=True but you don't actually check the returned results for errors.您使用return_exceptions=True但实际上并没有检查返回的结果是否有错误。 You can use asyncio.as_completed to handle exceptions and get the earliest next result:您可以使用asyncio.as_completed来处理异常并获得最早的下一个结果:

import asyncio
import random
import traceback

import aiohttp


URLS = ("https://stackoverflow.com",)
TIMEOUT = 5
PROXIES = (
    "http://38.39.205.220:80",
    "http://38.39.204.100:80",
    "http://38.39.204.101:80",
    "http://38.39.204.94:80",
)


def get_proxy():
    return random.choice(PROXIES)


async def download_site(session, url):
    proxy = get_proxy()

    print(f"Got proxy: {proxy}")

    async with session.get(url, proxy=f"{proxy}", timeout=TIMEOUT) as resp:
        print(f"{url}: {resp.status}")
        return await resp.text()


async def main():
    tasks = []

    async with aiohttp.ClientSession() as session:
        for url in URLS:
            tasks.append(asyncio.create_task(download_site(session, url)))

        for coro in asyncio.as_completed(tasks):
            try:
                html = await coro
            except Exception:
                traceback.print_exc()
            else:
                print(len(html))


if __name__ == "__main__":
    asyncio.run(main())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM