简体   繁体   English

我应该使用什么来加速此代码? (多处理与多线程)

[英]What should I use to speed up this code? (Multiprocessing vs Multithreading)

Could you also show me how? 您还可以告诉我如何吗? Thanks in advance. 提前致谢。

Here's the code: 这是代码:

def test():
    with open("proxies.txt", "r") as f:
        for line in f:
            try:
                proxy = line.split('\n', 1)[0]
                r = requests.get('http://www.icanhazip.com/', proxies={'http': 'http://' + proxy}, timeout=1)
                print(r.status_code)
            except (requests.exceptions.ProxyError, requests.exceptions.ConnectTimeout,
                requests.exceptions.ReadTimeout, requests.exceptions.ConnectionError):
                print("Doesn't work")

Multiprocessing or multithreading should only start when the function is called. 多处理或多线程仅应在调用该函数时启动。

I would think threading would be best, it doesn't seem that you are performing large amount of computational work with each worker. 我认为线程化是最好的,似乎您并没有在每个工作人员上执行大量的计算工作。 Sub-processes take a decent amount of overhead to get started, and are therefore more suited for tasks requiring large amounts of computations. 子流程需要相当大的开销才能开始,因此更适合需要大量计算的任务。

Two observations: 两个观察:

  1. You can try using the ThreadPoolExecutor or ProcessPoolExecutor from concurrent.futures library, so you can parallelize the execution. 您可以尝试使用parallel.futures库中的ThreadPoolExecutor或ProcessPoolExecutor,以便可以并行执行。

  2. You may want to see if creating explicit requests Session and reusing the session speed this up. 您可能想查看是否创建显式请求会话并重新使用会话可以加快此速度。 This may save some cost in TLS renegotiation/handshake. 这样可以节省TLS重新协商/握手的成本。 Note that you may need to be careful with cookies as reused sessions will share a cookie jar by default. 请注意,您可能需要注意cookie,因为默认情况下,重用的会话将共享一个cookie jar。

Untested, quickly scratched together example: 未经测试,很快就被抓在一起的例子:

session = requests.Session()
def do_request(line):
    proxy = line.split('\n', 1)[0]
    r = session.get('http://www.icanhazip.com/', proxies={'http': 'http://' + proxy}, timeout=1)
    return r.status_code

with ThreadPoolExecutor(max_workers=8) as executor, \
        open("proxies.txt", "r") as f:
    results = executor.map(do_request, f)
    for future in results:
        try:
            print(future.result())
        except (requests.exceptions.ProxyError, requests.exceptions.ConnectTimeout,
                requests.exceptions.ReadTimeout, requests.exceptions.ConnectionError):
            print("Doesn't work")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM