线程：尽管我使用线程，但 function 似乎作为阻塞循环运行

Question

我正在尝试通过在concurrent.futures库的ThreadPoolExecutor中运行我的 http 请求来加速 web 抓取。

这是代码：

import concurrent.futures
import requests
from bs4 import BeautifulSoup


urls = [
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=ibfxcfd&showcategories=CFD',
        'https://www.interactivebrokers.eu/en/index.php?f=41634&exch=chix_ca',
        'https://www.interactivebrokers.eu/en/index.php?f=41634&exch=tase',
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=chixen-be&showcategories=STK',
        'https://www.interactivebrokers.eu/en/index.php?f=41295&exch=bvme&showcategories=STK'
        ]

def get_url(url):
    print(url)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'lxml')
    a = soup.select_one('a')
    print(a)


with concurrent.futures.ThreadPoolExecutor(max_workers=12) as executor:
    results = {executor.submit( get_url(url)) : url for url in urls}

    for future in concurrent.futures.as_completed(results):
        try:
            pass
        except Exception as exc:
            print('ERROR for symbol:', results[future])
            print(exc)

但是，在查看脚本在 CLI 中的打印方式时，似乎请求是在阻塞循环中发送的。

另外，如果我使用以下代码运行代码，我会看到它花费的时间大致相同。

for u in urls:
    get_url(u)

我之前在使用该库实现并发方面取得了一些成功，但我不知道这里出了什么问题。

我知道存在 asyncio 库作为替代方案，但我会热衷于使用线程。

Answer 1

您实际上并没有将get_url调用作为任务运行； 您在主线程中调用它们，并将结果传递给executor.submit ，体验与 raw threading.Thread usage 类似的concurrent.futures threading.Thread 。 改变：

results = {executor.submit( get_url(url)) : url for url in urls}

至：

results = {executor.submit(get_url, url) : url for url in urls}

因此，您将 function 传递给调用，并将其 arguments 传递给submit调用（然后在线程中为您运行它们），它应该并行化您的代码。

线程：尽管我使用线程，但 function 似乎作为阻塞循环运行

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-08 20:37:43

线程：尽管我使用线程，但 function 似乎作为阻塞循环运行

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-08 20:37:43

解决方案1
2 已采纳 2020-12-08 20:37:43