简体繁体 English

GIL 正在杀死 I/O 绑定线程

[英]GIL is killing I/O-bound thread

原文 2016-04-02 20:11:36 6 2 python/ multithreading/ cpython/ gil

I've got a website written mostly in Python.我有一个主要用 Python 编写的网站。 The Python process that handles Python-bound requests has a dispatch thread which fetches requests from the web server and simply dispatches them to a thread-pool for handling.处理 Python 绑定请求的 Python 进程有一个分派线程，该线程从 Web 服务器获取请求并将它们分派到线程池进行处理。 The work done in the dispatch thread, thus, is pretty simple;因此，在调度线程中完成的工作非常简单； it just reads requests over a Unix socket and does a bit of synchronization on the thread pool.它只是通过 Unix 套接字读取请求，并在线程池上进行一些同步。 Under normal circumstances, it is capable of dispatching over 2,000 requests per second.在正常情况下，它能够每秒调度超过 2,000 个请求。

Something weird happens sometimes, however.然而，有时会发生一些奇怪的事情。 One part of the website does some image processing on uploaded files, and since the image processing algorithm is written entirely in Python, it takes a bit of time, spinning on the CPU.网站的一部分对上传的文件进行一些图像处理，并且由于图像处理算法完全是用 Python 编写的，所以它需要一些时间，在 CPU 上旋转。 On larger images, it can take 5 seconds or more.在较大的图像上，可能需要 5 秒或更长时间。 That's fine in itself, though;不过，这本身很好； the weird thing is that while it does its processing, throughput on the dispatch thread drops tremendously.奇怪的是，当它进行处理时，调度线程的吞吐量急剧下降。 While the image processor is running, dispatch throughput drops to some 20-30 requests per second -- almost two orders of magnitude !当图像处理器运行时，调度吞吐量下降到大约每秒 20-30 个请求——几乎两个数量级！

This causes some minor trouble for me, since during busy hours, the Python handler receives some 50-100 requests per second, and therefore is unable to keep up.这给我带来了一些小麻烦，因为在繁忙时间，Python 处理程序每秒接收大约 50-100 个请求，因此无法跟上。 For image processing requests that take some 3 seconds or more, the buffers start filling up and the web server is consequently forced to start dropping requests bound for Python.对于需要大约 3 秒或更长时间的图像处理请求，缓冲区开始填满，因此 Web 服务器被迫开始丢弃绑定到 Python 的请求。

I wrote a visualization tool to help debug the problem, and this image (cropped above) demonstrates what is happening.我编写了一个可视化工具来帮助调试问题，这张图片（上面裁剪的）展示了正在发生的事情。 The dispatch of each request is plotted as a line along the X axis, each subsequent request being plotted on subsequent Y coordinates.每个请求的调度被绘制为沿 X 轴的一条线，每个后续请求被绘制在后续 Y 坐标上。 Each vertical grid-line illustrates a second, and the red grid-line is where my HTTP server logs that it is starting to drop requests.每个垂直网格线说明一秒钟，红色网格线是我的 HTTP 服务器记录它开始丢弃请求的地方。 It can clearly be seen that the dispatch rate slows down a lot about 2.5 second prior to that, and comparing with the access logs, that is where the image processor kicked off.可以清楚地看到，在此之前大约 2.5 秒，调度速度慢了很多，与访问日志相比，这就是图像处理器启动的地方。

My hypothesis is that this is because the CPU-bound image processor thread is hogging the GIL, and that the dispatcher has to wait for some particular "processing window" to complete until the CPU-bound thread voluntarily releases the GIL for other threads to run.我的假设是，这是因为 CPU-bound 图像处理器线程占用 GIL，并且调度程序必须等待某个特定的“处理窗口”完成，直到 CPU-bound 线程自愿释放 GIL 以供其他线程运行. Whereas the dispatcher thread, on its hand, releases the GIL each time it goes into a blocking syscall and then has to wait for another entire processing window to complete before it is allowed to process the next request.而调度程序线程每次进入阻塞系统调用时都会释放 GIL，然后必须等待另一个整个处理窗口完成才能处理下一个请求。

If this hypothesis is correct, then I realize that I could fix this problem by forking off a separate process to do the image processing work.如果这个假设是正确的，那么我意识到我可以通过分离一个单独的进程来完成图像处理工作来解决这个问题。 That would complicate the code and make it uglier, however, so I'd like to avoid that if possible.但是，这会使代码复杂化并使其变得更丑陋，因此如果可能，我想避免这种情况。

Thus: Is there any way to avoid this apparent GIL problem?因此：有什么办法可以避免这种明显的 GIL 问题？ Can I make it so that the dispatcher thread doesn't relinquish the GIL so easily, allowing it to work off some backlog in between processing windows?我可以这样做，以便调度程序线程不会那么容易地放弃 GIL，允许它在处理窗口之间处理一些积压吗？ Can the GIL CPU window be "tweaked", or can I perhaps assign some lower "GIL priority" to the CPU-bound thread or something like that? GIL CPU 窗口是否可以“调整”，或者我是否可以为 CPU 绑定线程或类似的东西分配一些较低的“GIL 优先级”？ Is there some other way around it?还有其他方法吗？ Or have I perhaps misunderstood the problem entirely?或者我可能完全误解了这个问题？

Sorry for being long-winded, but I couldn't really figure a more concise way to describe the situation.抱歉啰嗦，但我真的想不出更简洁的方法来描述这种情况。

2 个解决方案

I did manage to figure out why this happened.我确实设法弄清楚为什么会发生这种情况。 As it turns out, it was not so much blocking syscalls that were a problem in themselves, but that part of the implementation of the thread pool made the dispatch thread wait until a worker thread could acknowledge that it had taken the request (for accounting reasons, basically) by way of signalling a condition variable that the dispatch thread waited on.事实证明，阻塞系统调用本身并不是一个问题，而是线程池的那部分实现使调度线程等待，直到工作线程可以确认它已经接受了请求（出于会计原因），基本上）通过发送调度线程等待的条件变量的信号。

I tried reimplementing the thread-pool such that the dispatch thread could simply post the request without having to work in lock-step with a worker thread, and that seems to have made the problem go away entirely.我尝试重新实现线程池，以便调度线程可以简单地发布请求，而不必与工作线程以锁步方式工作，这似乎使问题完全消失了。 Visualizing the request dispatching over a period of image processing now shows no slow-down whatsoever.可视化在图像处理期间的请求调度现在没有任何减速。 Presumably, then, the switching of the GIL between two threads created a larger window for the third, CPU-bound thread to snatch it for a longer period.据推测，两个线程之间的 GIL 切换为第三个受 CPU 限制的线程创建了一个更大的窗口，以便在更长的时间内抢夺它。

The lesson to be learned, then, I guess, is that current CPython (I'm using 3.4.2 on the server running this) seems to be fine with mixing I/O-bound and CPU-bound threads, but that two or more threads working in lock-step with each other may be starved by a CPU-bound thread.那么，我想要吸取的教训是当前的 CPython（我在运行它的服务器上使用 3.4.2）似乎可以很好地混合 I/O-bound 和 CPU-bound 线程，但是这两个或更多以锁步方式工作的线程可能会被 CPU 密集型线程耗尽。

I believe you have a correct idea of the problem.我相信你对这个问题有一个正确的认识。 To me, the most straightforward way to resolve this is to replace the threading model with a multiprocessing one.对我来说，解决这个问题最直接的方法是用多处理模型替换线程模型。 It would be much more complicated to avoid GIL issues within the same process comparing to simply spawn a separate process.与简单地产生一个单独的进程相比，在同一进程中避免 GIL 问题要复杂得多。 In python, there is no direct way (to my knowledge anyway) to change a thread's priority.在 python 中，没有直接的方法（据我所知）来改变线程的优先级。

The only alternative to stay in the same thread exist if you have written the image processing tool and wrapped it with Cython, then you could use the nogil option to release the GIL while the image processing occurs.如果您已经编写了图像处理工具并使用 Cython 将其封装，那么留在同一线程中的唯一选择存在，那么您可以使用nogil选项在图像处理发生时释放 GIL。

If you plan to make the website more robust, you could go for managing your workers with Celery .如果您打算使网站更强大，您可以使用Celery管理您的员工。 In the long run your web site would definitely be helped by having longer running tasks managed separately from the process(es) managing web I/O, but it would require you to set up some additional of infrastructure on top of your simple web process.从长远来看，将运行时间更长的任务与管理 Web I/O 的进程分开管理，肯定会对您的网站有所帮助，但这需要您在简单的 Web 进程之上设置一些额外的基础设施。