Python套接字压力并发

Question

I need a Python TCP server that can handle at least tens of thousands of concurrent socket connections. 我需要一个可以处理至少数万个并发套接字连接的Python TCP服务器。 I was trying to test Python SocketServer package capabilities in both multiprocessor and multithreaded modes, but both were far from desired performance. 我试图在多处理器和多线程模式下测试Python SocketServer包功能，但两者都远远没有达到预期的性能。

At first, I'll describe client, because it's common for both cases. 首先，我将描述客户端，因为这两种情况都很常见。

client.py client.py

import socket
import sys
import threading
import time


SOCKET_AMOUNT = 10000
HOST, PORT = "localhost", 9999
data = " ".join(sys.argv[1:])


def client(ip, port, message):
    sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    sock.connect((ip, port))
    while 1:
        sock.sendall(message)
        time.sleep(1)
    sock.close()


for i in range(SOCKET_AMOUNT):
    msg = "test message"
    client_thread = threading.Thread(target=client, args=(HOST, PORT, msg))
    client_thread.start()

Multiprocessor server: 多处理器服务器：

foked_server.py foked_server.py

import os
import SocketServer


class ForkedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
        cur_process = os.getpid()
        print "launching a new socket handler, pid = {}".format(cur_process)
        while 1:
            self.request.recv(4096)


class ForkedTCPServer(SocketServer.ForkingMixIn, SocketServer.TCPServer):
    pass


if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    server = ForkedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
    print "Starting Forked Server"
    server.serve_forever()

Multithreaded server: 多线程服务器：

threaded_server.py threaded_server.py

import threading
import SocketServer


class ThreadedTCPRequestHandler(SocketServer.BaseRequestHandler):

    def handle(self):
        cur_thread = threading.current_thread()
        print "launching a new socket handler, thread = {}".format(cur_thread)
        while 1:
            self.request.recv(4096)


class ThreadedTCPServer(SocketServer.ThreadingMixIn, SocketServer.TCPServer):
    pass


if __name__ == "__main__":
    HOST, PORT = "localhost", 9999

    server = ThreadedTCPServer((HOST, PORT), ForkedTCPRequestHandler)
    print "Starting Threaded Server"
    server.serve_forever()

In the first case, with forked_server.py , only 40 processes are created and approximately 20 of those start breaking in a while with the following error: 在第一种情况下，使用forked_server.py ，只创建了40个进程，其中大约20个进程突然断开，并出现以下错误：

error: [Errno 104] Connection reset by peer 错误：[Errno 104]由对等方重置连接

on a client side. 在客户端。

Threaded version is much more durable and holds more than 4000 connections, but eventually starts showing 螺纹版本更耐用，可容纳超过4000个连接，但最终开始显示

gaierror: [Errno -5] No address associated with hostname gaierror：[Errno -5]没有与主机名相关的地址

The tests were made on my local machine, Kubuntu 14.04 x64 on kernel v3.13.0-32. 测试是在我的本地机器上进行的，Kubuntu 14.04 x64在内核v3.13.0-32上进行。 These are the steps I've made to increase general performance of the system: 这些是我为提高系统的一般性能而采取的步骤：

Raise kernel limit on file handles: sysctl -w fs.file-max=10000000 提高文件句柄的内核限制： sysctl -w fs.file-max=10000000
Increase the connection backlog, sysctl -w net.core.netdev_max_backlog = 2500 增加连接积压， sysctl -w net.core.netdev_max_backlog = 2500
Raise the maximum connections, sysctl -w net.core.somaxconn = 250000 提高最大连接数， sysctl -w net.core.somaxconn = 250000

So, the questions are: 所以，问题是：

Were the tests correct, can I rely on those results? 如果测试正确，我可以依靠这些结果吗？ I'm new to all this Network/Socket stuff, so please correct me in my conclusions. 我是所有这些网络/套接字的新手，所以请在我的结论中纠正我。
Is it really the multiprocessor/multithreaded approach not viable in a heavy loaded systems? 在重负载系统中，多处理器/多线程方法真的不可行吗？
If yes, what options do we have left? 如果是，我们还有哪些选择？ Asynchronous approach? 异步方法？ Tornado/Twisted/Gevent frameworks? Tornado / Twisted / Gevent框架？

Answer 1

socketserver is not going to handle anywhere near 10k connections. socketserver不会处理10k连接附近的任何地方。 No threaded or forked server will on current hardware and OS's. 当前硬件和操作系统上没有线程或分叉服务器。 Thousands of threads means you spend more time context-switching and scheduling than actually working. 成千上万的线程意味着您花费更多时间进行上下文切换和调度而不是实际工作。 Modern linux is getting very good at scheduling threads and processes, and Windows is pretty good with threads (but horrible with processes), but there's a limit to what it can do. 现代linux在调度线程和进程方面非常擅长，而且Windows在线程方面相当不错（但是对于进程来说很糟糕），但它的功能有限。

And socketserver doesn't even try to be high-performance. 而socketserver甚至没有尝试高性能。

And of course CPython's GIL makes things worse. 当然，CPython的GIL会让事情变得更糟。 If you're not using 3.2+; 如果你不使用3.2+; any thread doing even a trivial amount of CPU-bound work is going to choke all of the other threads and block your I/O. 任何线程甚至做了大量的CPU限制工作都会阻塞所有其他线程并阻塞你的I / O. With the new GIL, if you avoid non-trivial CPU you don't add too much to the problem, but it still makes context switches more expensive than raw pthreads or Windows threads. 使用新的GIL，如果您避免使用非平凡的CPU，则不会对问题添加太多内容 ，但它仍会使上下文切换比原始pthread或Windows线程更昂贵。

So, what do you want? 所以，你想要什么？

You want a single-threaded "reactor" that services events in a loop and kicks off handlers. 您需要一个单线程“reactor”来为循环中的事件提供服务并启动处理程序。 (On Windows, and Solaris, there are advantages to instead using a "proactor", a pool of threads that all service the same event queue, but since you're on Linux, let's not worry about that.) Modern OS's have very good multiplexing APIs to build on— kqueue on BSD/Mac, epoll on Linux, /dev/poll on Solaris, IOCP on Windows—that can easily handle 10K connections even on hardware from years ago. （在Windows和Solaris上，使用“proactor”，一个所有服务于同一事件队列的线程池都有优势，但是因为你在Linux上，所以不要担心。）现代操作系统非常好用多路复用API以在BSD / Mac上构建on- kqueue ，在Linux上使用epoll ，在Solaris上使用/dev/poll ，在Windows上使用IOCP，即使在多年前的硬件上也可轻松处理10K连接。

socketserver isn't a terrible reactor, it's just that it doesn't provide any good way to dispatch asynchronous work, only threads or processes. socketserver不是一个糟糕的反应器，只是它没有提供任何好的方法来分派异步工作，只有线程或进程。 In theory, you could build a GreenletMixIn (with the greenlet extension module) or a CoroutineMixIn (assuming you either have or know how to write a trampoline and scheduler) without too much work on top of socketserver , and that might not be too heavy-weight. 理论上，您可以构建一个GreenletMixIn （使用greenlet扩展模块）或CoroutineMixIn （假设您拥有或知道如何编写trampoline和调度程序），而无需在socketserver之上进行太多工作，这可能不会太重 -重量。 But I'm not sure how much benefit you're getting out of socketserver at that point. 但我不确定你在socketserver从socketserver中获得了多少好处。

Parallelism can help, but only to dispatch any slow jobs off the main work thread. 并行性可以提供帮助，但只能从主要工作线程中调度任何慢速作业。 First get your 10K connections up, doing minimal work. 首先获得10K连接，做最少的工作。 Then, if the real work you want to add is I/O-bound (eg, reading files, or making requests to other services), add a pool of threads to dispatch to; 然后，如果您要添加的实际工作是I / O绑定（例如，读取文件或向其他服务发出请求），请添加要分派的线程池; if you need to add a lot of CPU-bound work, add a pool of processes instead (or, in some cases, even one of each). 如果你需要添加大量的CPU绑定工作，请添加一个流程池（或者，在某些情况下，甚至是每个流程中的一个）。

If you can use Python 3.4, the stdlib has an answer in asyncio (and there's a backport on PyPI for 3.3, but it's inherently impossible to backport to earlier versions). 如果你可以使用Python 3.4，那么stdlib在asyncio有一个答案（并且在PyPI上有3.3的asyncio ，但它本身不可能向后移植到早期版本）。

If not… well, you can build something yourself on top of selectors in 3.4+ if you don't care about Windows, or select in 2.6+ if you only care about linux, *BSD, and Mac and are willing to write two versions of your code, but it's going to be a lot of work. 如果不是......好吧，如果你不关心Windows，你可以在3.4+的selectors之上自己构建一些东西，或者如果你只关心linux，* BSD和Mac并且愿意写两个版本，你可以在2.6+中select你的代码，但这将是很多工作。 Or you can write your core event loop in C (or just use an existing one like libev or libuv or libevent ) and wrap it in an extension module. 或者您可以在C中编写核心事件循环（或者只使用像libev或libuv或libevent这样的现有循环）并将其包装在扩展模块中。

But really, you probably want to turn to third-party libraries. 但实际上，您可能希望转向第三方库。 There are many of them, with very different APIs, from gevent (which tries to make your code look like preemptively threaded code but actually runs in greenlets on a single-threaded event loop) to Twisted (which is based around explicit callbacks and futures, similar to many modern JavaScript frameworks). 它们中有很多，有着非常不同的API，从gevent （它试图让你的代码看起来像抢占式线程代码，但实际上在单线程事件循环中运行greenlet）到Twisted （基于显式回调和期货，类似于许多现代JavaScript框架）。

StackOverflow isn't a good place to get recommendations for specific libraries, but I can give you a general recommendation: Look them over, pick the one whose API sounds best for your application, test whether it's good enough, and only fall back to another one if the one you like can't cut it (or if you turned out to be wrong about liking the API). StackOverflow不是获取特定库建议的好地方，但我可以给你一个大致的建议：看看它们，选择一个API听起来最适合你的应用程序，测试它是否足够好，并且只能回到另一个一个，如果你喜欢的人不能削减它（或者如果你对喜欢的API有误）。 Fans of some of these libraries (especially gevent and tornado will tell you that their favorite is "fastest", but who cares about that? What matters is whether they're fast enough and usable to write your app. 其中一些图书馆的粉丝（特别是gevent和tornado会告诉你他们最喜欢的是“最快”，但是谁在乎呢？重要的是它们是否足够快且可用于编写你的应用程序。

Off the top of my head, I'd search for gevent , eventlet , concurrence , cogen , twisted , tornado , monocle , diesel , and circuits . 关闭我的头顶，我会寻找gevent ， eventlet ， concurrence ， cogen ， twisted ， tornado ， monocle ， diesel和circuits 。 That probably isn't a great list, but if you google all those terms together, I'll bet you'll find an up-to-date comparison, or an appropriate forum to ask on. 这可能不是一个很好的列表，但如果你把所有这些条款一起谷歌，我敢打赌你会找到一个最新的比较，或一个适当的论坛来询问。

Answer 2

This guy seemed to have a pretty good solution using threading and subprocess . 这家伙似乎有一个很好的解决方案，使用threading和subprocess 。

#!/usr/bin/env python
# ssl_load.py - Corey Goldberg - 2008

import httplib
from threading import Thread

threads = 250
host = '192.168.1.14'
file = '/foo.html'

def main():
    for i in range(threads):
        agent = Agent()
        agent.start()

class Agent(Thread):
    def __init__(self):
        Thread.__init__(self)

    def run(self):
        while True:
            conn = httplib.HTTPSConnection(host)
            conn.request('GET', file)
            resp = conn.getresponse()

if __name__ == '__main__':
    main()

Allowed him to have at most 250 threads per process due to Windows XP constraints. 由于Windows XP的限制，允许他每个进程最多拥有250个线程。 This is considering he had pretty poor hardware compared to today's standards. 这是考虑到他的硬件相当于今天的标准。 He was able to reach a 15k thread max by running this script as multiple processes as shown here: 通过将此脚本作为多个进程运行，他能够达到最大15k线程，如下所示：

#!/usr/bin/env python

import subprocess
processes = 60
for i in range(processes):
    subprocess.Popen('python ssl_load.py')

Hope this helps you out! 希望这可以帮助你！

Python套接字压力并发

问题描述

2 个解决方案

解决方案1
12 已采纳 2014-09-11 06:00:35

解决方案2
1 2016-07-28 16:17:04

Python套接字压力并发

问题描述

2 个解决方案

解决方案1 12 已采纳 2014-09-11 06:00:35

解决方案2 1 2016-07-28 16:17:04

解决方案1
12 已采纳 2014-09-11 06:00:35

解决方案2
1 2016-07-28 16:17:04