简体繁体 English

扩展执行繁重计算的 node.js websocket 服务器

[英]Scaling a node.js websocket server that does heavy computations

原文 2020-08-15 07:32:36 5 1 node.js/ kubernetes/ websocket

I have a node.js server with one websocket connection that receives a high amount of messages.我有一个 node.js 服务器和一个接收大量消息的 websocket 连接。 This server also does heavy cpu work.该服务器还执行繁重的 cpu 工作。 I see the ws connections keep dropping/reconnecting/dropping/etc.我看到 ws 连接不断丢弃/重新连接/丢弃/等。 My guess is that the cpu work blocks the main thread so much, and at the same time the ws receives so many messages, that the ws connection just fails?我的猜测是cpu工作阻塞了主线程这么多，同时ws收到这么多消息，ws连接就失败了？

My initial solution was to move all cpu work to a node worker_thread, which helped a bit, but I'm still seeing the ws lose connection a lot.我最初的解决方案是将所有 cpu 工作转移到一个节点 worker_thread，这有点帮助，但我仍然看到 ws 失去了很多连接。 My thinking was, node is supposed to be super efficient at network IO, so if I moved all the cpu work into another thread, the network work wouldn't be blocked by cpu work.我的想法是，node 应该在网络 IO 上非常高效，所以如果我将所有 cpu 工作转移到另一个线程中，网络工作就不会被 cpu 工作阻塞。

The server is stateful, there can only be 1 of this server, so I can't just spin up more.服务器是有状态的，这个服务器只能有 1 个，所以我不能再启动更多。

I'm not really sure how to proceed.我不确定如何进行。 Some ideas are:一些想法是：

Move the cpu work into another process, and communicate thru some inter-process communication method.将 cpu 工作转移到另一个进程中，并通过某种进程间通信方法进行通信。 But how is this better than a worker_thread?但这比 worker_thread 更好吗？
Horizontally scale the websockets, so if one fails, the others will pick up the slack.水平扩展 websocket，所以如果一个失败，其他的就会弥补。 Scaling websockets in this way seems pretty complicated.以这种方式扩展 websocket 似乎相当复杂。

1 个解决方案

Number one -第一 -

//The server is stateful, there can only be 1 of this server, so I can't just spin up more. //服务器是有状态的，这台服务器只能有1台，不能随便多转。

You should remove this bottle neck.你应该去掉这个瓶颈。 You will go nowhere as long as you have this.只要你有这个，你就不会 go The idea of scaling is distribution of the network and cpu workloads among the replicas of your application.扩展的想法是在应用程序的副本之间分配网络和 cpu 工作负载。

If your server is statefull, create a state controller server which handles all the state information.如果您的服务器是有状态的，请创建一个 state controller 服务器来处理所有 state 信息。 Spin up replicas of your applications and establish intra-cluster communication between your replicas and the state controller server.启动应用程序的副本并在副本和 state controller 服务器之间建立集群内通信。

Once this setup is done, create a load balancer which can check the readiness of your replicas individually and forward traffic to the available replicas.完成此设置后，创建一个负载均衡器，它可以单独检查副本的就绪情况并将流量转发到可用副本。 Don't forget, in Node JS, network handling happens at kernel level.不要忘记，在 Node JS 中，网络处理发生在 kernel 级别。 So, that OS mechanism will handle the load at the front line and keep the requests waiting until the workloads are ready to handle them.因此，该操作系统机制将在第一线处理负载并保持请求等待，直到工作负载准备好处理它们。 This setup allows you to control thresholds such as number of replicas and readiness timeouts - which can clear the way for performance fine tuning.此设置允许您控制阈值，例如副本数量和就绪超时 - 这可以为性能微调扫清障碍。 These adjustments depend on the facts such as how large your request data, response data, process time etc.这些调整取决于您的请求数据、响应数据、处理时间等事实。

The good news is that almost all the container orchestration systems provide all above.好消息是几乎所有的容器编排系统都提供了以上所有功能。 Mostly, you will have to do yourself the server which handles the states.大多数情况下，您必须自己处理处理状态的服务器。

https://kubernetes.io/docs/concepts/cluster-administration/networking/ https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ https://kubernetes.io/docs/concepts/cluster-administration/networking/ https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ https://kubernetes.io/docs/tasks/configure- pod-container/configure-liveness-readiness-startup-probes/