简体   繁体   中英

Scaling a node.js websocket server that does heavy computations

I have a node.js server with one websocket connection that receives a high amount of messages. This server also does heavy cpu work. I see the ws connections keep dropping/reconnecting/dropping/etc. My guess is that the cpu work blocks the main thread so much, and at the same time the ws receives so many messages, that the ws connection just fails?

My initial solution was to move all cpu work to a node worker_thread, which helped a bit, but I'm still seeing the ws lose connection a lot. My thinking was, node is supposed to be super efficient at network IO, so if I moved all the cpu work into another thread, the network work wouldn't be blocked by cpu work.

The server is stateful, there can only be 1 of this server, so I can't just spin up more.

I'm not really sure how to proceed. Some ideas are:

  1. Move the cpu work into another process, and communicate thru some inter-process communication method. But how is this better than a worker_thread?
  2. Horizontally scale the websockets, so if one fails, the others will pick up the slack. Scaling websockets in this way seems pretty complicated.

Number one -

//The server is stateful, there can only be 1 of this server, so I can't just spin up more.

You should remove this bottle neck. You will go nowhere as long as you have this. The idea of scaling is distribution of the network and cpu workloads among the replicas of your application.

If your server is statefull, create a state controller server which handles all the state information. Spin up replicas of your applications and establish intra-cluster communication between your replicas and the state controller server.

Once this setup is done, create a load balancer which can check the readiness of your replicas individually and forward traffic to the available replicas. Don't forget, in Node JS, network handling happens at kernel level. So, that OS mechanism will handle the load at the front line and keep the requests waiting until the workloads are ready to handle them. This setup allows you to control thresholds such as number of replicas and readiness timeouts - which can clear the way for performance fine tuning. These adjustments depend on the facts such as how large your request data, response data, process time etc.

The good news is that almost all the container orchestration systems provide all above. Mostly, you will have to do yourself the server which handles the states.

https://kubernetes.io/docs/concepts/cluster-administration/networking/ https://kubernetes.io/docs/concepts/workloads/controllers/deployment/ https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM