简体繁体 English

Kubernetes 上的生产就绪 Python 应用程序

[英]Production ready Python apps on Kubernetes

原文 2019-10-24 05:57:05 4 2 python/ django/ flask/ kubernetes/ uwsgi

I have been deploying apps to Kubernetes for the last 2 years.在过去的 2 年里，我一直在将应用程序部署到 Kubernetes。 And in my org, all our apps(especially stateless) are running in Kubernetes.在我的组织中，我们所有的应用程序（尤其是无状态的）都在 Kubernetes 中运行。 I still have a fundamental question, just because very recently we found some issues with respect to our few python apps.我还有一个基本问题，因为最近我们发现了一些与我们的少数 python 应用程序有关的问题。

Initially when we deployed, our python apps(Written in Flask and Django), we ran it using python app.py .最初，当我们部署 python 应用程序（用 Flask 和 Django 编写）时，我们使用python app.py运行它。 It's known that, because of GIL, python really doesn't have support for system threads, and it will only serve one request at a time, but in case the one request is CPU heavy, it will not be able to process further requests.据了解，由于 GIL，python 确实不支持系统线程，一次只能处理一个请求，但如果一个请求 CPU 很重，将无法处理更多请求。 This is causing sometimes the health API to not work.这有时会导致运行状况 API 无法正常工作。 We have observed that, at this moment, if there is a single request which is not IO and doing some operation, we will hold the CPU and cannot process another request in parallel.我们观察到，此时，如果有一个不是 IO 的请求并进行一些操作，我们将占用 CPU，无法并行处理另一个请求。 And since it's only doing fewer operations, we have observed there is no increase in the CPU utilization also.而且由于它只执行较少的操作，我们观察到 CPU 利用率也没有增加。 This has an impact on how HorizontalPodAutoscaler works, its unable to scale the pods.这会影响HorizontalPodAutoscaler的工作方式，它无法缩放 pod。

Because of this, we started using uWSGI in our pods.因此，我们开始在我们的 pod 中使用uWSGI 。 So basically uWSGI can run multiple pods under the hood and handle multiple requests in parallel, and automatically spin new processes on demand.所以基本上uWSGI可以在后台运行多个 pod 并并行处理多个请求，并根据需要自动启动新进程。 But here comes another problem, that we have seen, uwsgi is lacking speed in auto-scaling the process tocorrected serve the request and its causing HTTP 503 errors, Because of this we are unable to serve our few APIs in 100% availability.但是这里出现了另一个问题，我们已经看到， uwsgi在自动缩放进程以纠正服务请求方面缺乏速度，并导致HTTP 503错误，因此我们无法以 100% 的可用性为我们的少数 API 提供服务。

At the same time our all other apps, written in nodejs , java and golang , is giving 100% availability.同时，我们所有其他应用程序，用nodejs 、 java和golang ，都提供 100% 的可用性。

I am looking at what is the best way by which I can run a python app in 100%(99.99) availability in Kubernetes, with the following我正在研究在 Kubernetes 中以 100%(99.99) 的可用性运行 python 应用程序的最佳方法是什么

Having health API and liveness API served by the app拥有应用程序服务的健康 API 和活力 API

An app running in Kubernetes在 Kubernetes 中运行的应用程序

If possible without uwsgi(Single process per pod is the fundamental docker concept)如果可能没有 uwsgi（每个 pod 单个进程是基本的 docker 概念）

If with uwsgi, are there any specific config we can apply for k8s env如果使用 uwsgi，是否有我们可以申请 k8s env 的任何特定配置

2 个解决方案

We use Twisted's WSGI server with 30 threads and it's been solid for our Django application.我们使用具有 30 个线程的 Twisted 的 WSGI 服务器，它对于我们的 Django 应用程序来说是可靠的。 Keeps to a single process per pod model which more closely matches Kubernetes' expectations, as you mentioned.正如您所提到的，每个 pod model 保持一个进程，这更符合 Kubernetes 的期望。 Yes, the GIL means only one of those 30 threads can be running Python code at time, but as with most webapps, most of those threads are blocked on I/O (usually waiting for a response from the database) the vast majority of the time.是的，GIL 意味着这 30 个线程中只有一个线程可以同时运行 Python 代码，但与大多数 Web 应用程序一样，这些线程中的大多数都在 I/O 上被阻塞（通常等待来自数据库的响应）。时间。 Then run multiple replicas on top of that both for redundancy and to give you true concurrency at whatever level you need (we usually use 4-8 depending on the site traffic, some big ones are up to 16).然后在此之上运行多个副本以实现冗余并在您需要的任何级别上为您提供真正的并发性（我们通常使用 4-8 个，具体取决于站点流量，一些大的高达 16 个）。

I have exactly the same problem with a python deployment running the Flask application.对于运行 Flask 应用程序的 python 部署，我遇到了完全相同的问题。 Most api calls are handled in a matter of seconds, but there are some cpu intensive requests that acquire GIL for 2 minutes.... The pod keep accepting requests, ignores the configured timeouts, ignores a closed connection by the user;大多数 api 调用在几秒钟内处理，但有一些 cpu 密集型请求获取 GIL 2 分钟.... pod 不断接受请求，忽略配置的超时，忽略用户关闭的连接； then after 1 minute of liveness probes failing, the pod is restarted by kubelet.然后在 1 分钟的 liveness 探测失败后，pod 由 kubelet 重新启动。

So 1 fat request can dramatically drop the availability.所以 1 个胖请求会大大降低可用性。

I see two different solutions: 1) have a separate deployment that will host only long running api calls;我看到了两种不同的解决方案：1）有一个单独的部署，它将只托管长时间运行的 api 调用； configure ingress to route requests between these two deployments;配置入口以在这两个部署之间路由请求； 2) using multiprocessing handle liveness/readyness probes in a main process, every other request must be handled in the child process; 2）在主进程中使用多进程句柄活跃度/就绪度探针，必须在子进程中处理所有其他请求；

There are pros and cons for each solution, maybe I will need a combination of both.每个解决方案都有优点和缺点，也许我需要两者结合。 Also if I need a steady flow of prometheus metrics, I might need to create a proxy server on the application layer (1 more container on the same pod).此外，如果我需要稳定的 prometheus 指标流，我可能需要在应用程序层上创建一个代理服务器（同一个 pod 上还有 1 个容器）。 Also need to configure ingress to have a single upstream connection to python pods, so that long running request will be queued, whereas short ones will be processed concurrently (yep, python, concurrency, good joke).还需要将 ingress 配置为与 python pod 的单个上游连接，以便将长时间运行的请求排队，而将同时处理短请求（是的，python，并发，好笑话）。 Not sure tho it will scale well with HPA.不确定它是否可以很好地与 HPA 一起扩展。

So yeah, running production ready python rest api server on kubernetes is not a peas of cake.所以，是的，在 ZB76E98AF9AAA680979BF5A65B2D5A1 上运行生产就绪 python rest api 服务器不是一块蛋糕。 Go and java have a much better ecosystem for microservice applications. Go 和 java 为微服务应用程序提供了更好的生态系统。

PS here is a good article that shows that there is no need to run your app in kubernetes with WSGI https://techblog.appnexus.com/beyond-hello-world-modern-asynchronous-python-in-kubernetes-f2c4ecd4a38d PS这里有一篇很好的文章，它表明没有必要使用WSGI https在kubernetes中运行你的应用程序：//techblog.appnexus.com/beyond-hello-world-modern-asynchronous-python-in-kubernetes-f2c4ecd4a38

PPS Im considering to use prometheus exporter for flask. PPS 我正在考虑为 flask 使用 prometheus 导出器。 Looks better than running a python client in a separate thread;看起来比在单独的线程中运行 python 客户端更好； https://github.com/rycus86/prometheus_flask_exporter https://github.com/rycus86/prometheus_flask_exporter