简体   繁体   English

Python 请求与 PyCurl 性能

[英]Python Requests vs PyCurl Performance

How does the Requests library compare with the PyCurl performance wise? Requests 库与 PyCurl 的性能相比如何?

My understanding is that Requests is a python wrapper for urllib whereas PyCurl is a python wrapper for libcurl which is native, so PyCurl should get better performance, but not sure by how much.我的理解是 Requests 是 urllib 的 python 包装器,而 PyCurl 是原生 libcurl 的 python 包装器,所以 PyCurl 应该获得更好的性能,但不确定有多少。

I can't find any comparing benchmarks.我找不到任何比较基准。

I wrote you a full benchmark , using a trivial Flask application backed by gUnicorn/meinheld + nginx (for performance and HTTPS), and seeing how long it takes to complete 10,000 requests.我给你写了一个完整的基准测试,使用由 gUnicorn/meinheld + nginx 支持的简单 Flask 应用程序(用于性能和 HTTPS),并查看完成 10,000 个请求需要多长时间。 Tests are run in AWS on a pair of unloaded c4.large instances, and the server instance was not CPU-limited.测试在 AWS 中的一对未加载的 c4.large 实例上运行,并且服务器实例不受 CPU 限制。

TL;DR summary: if you're doing a lot of networking, use PyCurl, otherwise use requests. TL;DR 总结:如果你做很多网络,使用 PyCurl,否则使用请求。 PyCurl finishes small requests 2x-3x as fast as requests until you hit the bandwidth limit with large requests (around 520 MBit or 65 MB/s here), and uses from 3x to 10x less CPU power. PyCurl 完成小请求的速度是请求的 2 到 3 倍,直到达到大请求的带宽限制(此处约为 520 MBit 或 65 MB/s),并且使用的 CPU 功率减少 3 到 10 倍。 These figures compare cases where connection pooling behavior is the same;这些图比较了连接池行为相同的情况; by default, PyCurl uses connection pooling and DNS caches, where requests does not, so a naive implementation will be 10x as slow.默认情况下,PyCurl 使用连接池和 DNS 缓存,而请求不使用,因此简单的实现将慢 10 倍。

组合图表RPS 详细的请求大小的 CPU 时间

仅 HTTP 吞吐量 只是 HTTP RPS

Note that double log plots are used for the below graph only, due to the orders of magnitude involved请注意,由于涉及的数量级,双对数图仅用于下图HTTP 和 HTTPS 吞吐量 HTTP 和 HTTPS RPS

  • pycurl takes about 73 CPU-microseconds to issue a request when reusing a connection pycurl 在重用连接时发出请求大约需要 73 个 CPU 微秒
  • requests takes about 526 CPU-microseconds to issue a request when reusing a connection请求在重用连接时需要大约526 个 CPU 微秒来发出请求
  • pycurl takes about 165 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~92 microseconds to open pycurl 需要大约 165 个 CPU 微秒来打开一个新连接并发出请求(没有连接重用),或者大约 92 微秒来打开
  • requests takes about 1078 CPU-microseconds to open a new connection and issue a request (no connection reuse), or ~552 microseconds to open requests 大约需要1078 CPU-microseconds 来打开一个新连接并发出一个请求(没有连接重用),或者 ~552 微秒来打开

Full results are in the link , along with the benchmark methodology and system configuration.完整结果以及基准测试方法和系统配置都在链接中

Caveats: although I've taken pains to ensure the results are collected in a scientific way, it's only testing one system type and one operating system, and a limited subset of performance and especially HTTPS options.警告:虽然我已经努力确保以科学的方式收集结果,但它仅测试一种系统类型和一种操作系统,以及有限的性能子集,尤其是 HTTPS 选项。

First and foremost, requests is built on top of the urllib3 library , the stdlib urllib or urllib2 libraries are not used at all.首先, requests建立在urllib3之上, urllib3不使用 stdlib urlliburllib2库。

There is little point in comparing requests with pycurl on performance.在性能pycurl requestspycurl进行比较没有什么意义。 pycurl may use C code for its work but like all network programming, your execution speed depends largely on the network that separates your machine from the target server. pycurl可能使用 C 代码,但与所有网络编程一样,您的执行速度在很大程度上取决于将您的机器与目标服务器分开的网络。 Moreover, the target server could be slow to respond.此外,目标服务器的响应速度可能很慢。

In the end, requests has a far more friendly API to work with, and you'll find that you'll be more productive using that friendlier API.最后, requests有一个更友好的 API 可以使用,你会发现使用这个更友好的 API 会更有效率。

It seems there is a new kid on the block: - requests interface for pycurl.似乎有一个新的孩子在块上: - 请求 pycurl 的接口。

Thank You for the bench mark - it was nice - I like curl and it seems to be able to do a bit more than http.谢谢你的基准测试 - 很好 - 我喜欢 curl,它似乎比 http 能做的更多。

https://github.com/dcoles/pycurl-requests https://github.com/dcoles/pycurl-requests

Focussing on Size -专注于尺寸 -

  1. On my Mac Book Air with 8GB of RAM and a 512GB SSD, for a 100MB file coming in at 3 kilobytes a second (from the internet and wifi), pycurl, curl and the requests library's get function (regardless of chunking or streaming) are pretty much the same.在具有 8GB RAM 和 512GB SSD 的 Mac Book Air 上,对于以每秒 3 KB 的速度传入的 100MB 文件(来自互联网和 wifi),pycurl、curl 和请求库的 get 函数(无论是分块还是流媒体)都是几乎相同的。

  2. On a smaller Quad core Intel Linux box with 4GB RAM, over localhost (from Apache on the same box), for a 1GB file, curl and pycurl are 2.5x faster than the 'requests' library.在具有 4GB RAM 的较小四核 Intel Linux 机器上,本地主机(来自同一机器上的 Apache)上,对于 1GB 文件,curl 和 pycurl 比“请求”库快 2.5 倍。 And for requests chunking and streaming together give a 10% boost (chunk sizes above 50,000).对于请求分块和流式传输一起提供 10% 的提升(块大小超过 50,000)。

I thought I was going to have to swap requests out for pycurl, but not so as the application I'm making isn't going to have client and server that close.我以为我将不得不为 pycurl 交换请求,但事实并非如此,因为我正在制作的应用程序不会关闭客户端和服务器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM