简体   繁体   English

如何使用Python同时发送10,000个HTTP请求

[英]How to send 10,000 HTTP requests concurrently using Python

I need to get HTTP GET response from the top 1-million domains, and I want to open as much concurrent thread as possible so I can finish it faster. 我需要从前一百万个域中获取HTTP GET响应,并且我想打开尽可能多的并发线程,以便更快地完成它。 The only relevant post that I found is What is the fastest way to send 100,000 HTTP requests in Python? 我发现的唯一相关文章是用Python发送100,000个HTTP请求的最快方法什么? and the solution uses concurrent.futures works as expected. 并且该解决方案使用并发功能。futures按预期工作。

However, the problem is as I am setting the number of workers higher, the performance gain seems to stagnant, ie, I do not sense any difference if I set number of workers to 1000 or 10,000. 但是,问题在于,当我将工人数量设置得更高时,性能提升似乎停滞了,即,如果我将工人数量设置为1000或10,000,我不会感到任何差异。 I run it on paid EC2 instance and I can see I am only using a fraction of the available CPU and memory. 我在付费EC2实例上运行它,可以看到我只使用了一部分可用的CPU和内存。 Not sure what happened, is there a limit that how many concurrent thread that I can create? 不知道发生了什么,我可以创建多少个并发线程是否有限制? Can I override the limit? 我可以超越限制吗?

I find there isn't much difference between urllib3 and requests (requests might be a shade faster). 我发现urllib3和请求之间没有太大区别(请求可能快一点)。 I would use an async library since this is a prime use case. 我将使用一个异步库,因为这是一个主要用例。

from gevent import monkey, spawn, joinall
monkey.patch_all()
import urllib3, certifi
from time import time

threads = []
url = 'https://www.google.com'
upool = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(), num_pools=20,  block=False)

t0 = time()
for i in xrange(10000):
    threads.append(spawn(upool.request,'GET',url))

x = joinall(threads)

print len(x)
print time() - t0

Notice you can cap the number of connections used at once by adding true to block . 请注意,可以通过将true添加到block来限制一次使用的连接数。

* UPDATE FOR MULTIPROCESSING * *多进程更新*

from gevent import monkey, spawn, joinall
monkey.patch_all()
import urllib3, certifi
from time import time
import gipc

worker = {}
num_threads = 1000

def fetch(num_threads, url, cpu):
    print('starting {}'.format(cpu))
    threads = []
    upool = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where(), num_pools=20, block=False)
    t0 = time()
    for i in xrange(num_threads):
        threads.append(spawn(upool.request, 'GET', url))
    x = joinall(threads)
    return x, time() - t0

def count_cpus():
    import multiprocessing
    cpus = multiprocessing.cpu_count()
    print(cpus)
    return cpus

def multicore(url):
    global worker
    with gipc.pipe() as (r,w):
        for cpu in range(count_cpus()):
            worker[str(cpu)] = gipc.start_process(target=fetch, args=(num_threads, url, cpu))
    for work in worker:
        worker[work].join()
    return worker

if __name__ == '__main__':
    multicore('https://www.google.com')

    for work in worker:
        print worker[work]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python发送并发10,000个post请求? - How to send concurrent 10,000 post request using python? 在Python 2.7中发送10,000个HTTP请求的最快方法是什么? - What is the fastest way to send 10,000 HTTP requests in Python 2.7? 如何使用 python 使这个骰子游戏在模拟中运行 10,000 次? - How do I make this dice game run in a simulation 10,000 times using python? 如何并行化 python 脚本以处理 10,000 个文件? - How to parallelise python script for processing 10,000 files? Python:使用Elasticsearch Scan得到一万多条结果ScanError - Python: Using Elasticsearch Scan to get more than 10,000 results ScanError 如何使用 Python 发送多个 HTTP 请求 - How to send multiple HTTP requests using Python 有没有更有效的方式将10,000个excel行加载到python中? - Is there a more efficient way to load 10,000 excel rows into python? 抓取10,000个网站的元数据太慢(Python) - Scraping the metadata of 10,000 website is too slow (Python) 大于10,000的数字进行列表比较的Python效率 - Python efficiency for list comparison on numbers >= 10,000 在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么? - What is the fastest way to send 100,000 HTTP requests in Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM