简体   繁体   English

如何使用Socket.io/Node.js服务器实现零停机时间?

[英]How to get zero downtime with Socket.io / Node.js server?

I have a Node.js web server running with Socket.io. 我有一个运行Socket.io的Node.js Web服务器。 I found that if one error happens in the script, the entire server crashes. 我发现,如果脚本中发生一个错误,则整个服务器都将崩溃。 So I'm trying to find a solution to keep the server up and running in cases like this when the app goes into Production. 因此,我试图找到一种解决方案,以在应用程序进入生产状态时保持服务器正常运行。 I found one answer that seemed promising, but doesn't solve my particular problem when I tried implementing it on my code: How do I prevent node.js from crashing? 我找到了一个看似有希望的答案,但是当我尝试在我的代码上实现它时却不能解决我的特殊问题: 如何防止node.js崩溃? try-catch doesn't work 尝试捕获不起作用

EDIT: 编辑:

What I fixed so far: I now have PM2 to auto-restart script upon crash, and I now have Redis set up and have my user session data stored in it. 到目前为止,我已解决的问题:我现在拥有PM2可以在崩溃时自动重新启动脚本,并且现在已经设置了Redis,并将用户会话数据存储在其中。

My code is currently set up like this: 我的代码当前是这样设置的:

EDIT #2: After studying and working on the code all day and edited the code slightly a second time to include " sticky-session " logic. 编辑#2:经过一整天的研究和研究代码,并再次第二次编辑代码以包含“ 粘性会话 ”逻辑。 After editing code, there are no longer strange sockets connection every 1 second and it seems like (I'm not completely sure though) the sockets are all in sync with workers. 编辑代码后,不再每隔1秒就有一个奇怪的套接字连接,而且(尽管我不太确定)套接字似乎都与工作人员同步。 When the script crashes, the app (not PM2) spawns a new process, which seems good. 当脚本崩溃时,应用程序(不是PM2)会产生一个新进程,这看起来不错。 However when a worker crashes, users still have to refresh the page again to refresh their session and get new sockets, which is a big problem... 但是,当工作程序崩溃时,用户仍然必须再次刷新页面以刷新其会话并获取新的套接字,这是一个大问题……

var fs = require('fs');
  https = require('https'),
  express = require('express'),
  options = {
    key: fs.readFileSync('/path/to/privkey.pem'),
    cert: fs.readFileSync('/path/to/fullchain.pem')
  },
  cluster = require('cluster'), // not really sure how to use this
  net = require('net'), // not really sure what to do here
  io = require('socket.io'),
  io_redis = require('socket.io-redis'), // not really sure how to use this
  sticky = require('sticky-session'),
  os = require('os');
  var numCPUs = os.cpus().length;
  var server = https.createServer(options,app, function(req, res) {
    res.end('worker: '+cluster.worker.id);
  });

if(!sticky.listen(server, 3000) {
  // Master code
  for(var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
  server.once('listening', function() {
    console.log('server started on port 3000');
  });
}
else {
  // Worker code
  var 
    io = io(server),
    io.adapter(io_redis({host: 'localhost', port: 6379})),
    getUser = require('./lib/getUser'),
    loginUser = require('./lib/loginUser'),
    authenticateUser = require('./lib/authenticateUser'),
    client = require('./lib/redis'); // connect to redis

  client.on("error", function(err) {
    console.log("Error "+err);
  });

  io.on('connection', function(socket){
    // LOTS OF SOCKET EVENTS / REDIS USER SESSION MANAGEMENT / APP
  });

}

I tried using "cluster", but I'm not sure how to get it working properly, since it involves multiple "workers", and I believe the sockets get mixed up between. 我尝试使用“集群”,但是我不确定如何使它正常工作,因为它涉及多个“工人”,而且我相信插座之间会混杂在一起。 I'm not even sure what parts of my code ("require" functions, etc) go in which "cluster" code blocks (Master/Worker), or how to keep the sockets in sync. 我什至不确定我的代码的哪些部分(“ require”功能等)放在哪个“集群”代码块(Master / Worker)中,或者如何使套接字保持同步。 Something just isn't right. 只是不对劲。

I'm assuming I need to use npm package socket.io-redis and/or sticky-session to keep the sockets in sync? 我假设我需要使用npm软件包socket.io-redis和/或sticky-session来保持套接字同步? (not sure how to implement this). (不确定如何实施)。 Unfortunately, there just aren't any good examples on the internet or in the books I'm reading for clustering socket.io with node.js 不幸的是,在互联网上或在我正在阅读的关于使用node.js集群socket.io的书籍中,都没有很好的例子

Can someone provide a basic code example on which parts of my code go where, or how to implement things? 有人可以提供一个基本的代码示例,说明我的代码的哪些部分位于何处,或如何实现。 I would greatly appreciate it. 我将不胜感激。 The goals are: 目标是:

1) If the server (node cluster process) crashes, the sockets should still work after restart (or another worker spawns). 1)如果服务器(节点集群进程)崩溃,则套接字在重新启动后仍将继续工作(或产生另一个工作程序)。

For example, if two users (two sockets) are having a private message conversation and then a crash happens, the messages should still be delivered after PM2 auto-restarts (spawns a new cluster process) after crash. 例如,如果两个用户(两个套接字)正在进行私人消息对话,然后发生崩溃,则崩溃后PM2自动重新启动(产生新的群集进程)后,仍应传递消息。 The problem I have: If the server crashes, messages stop getting sent to users even after an auto-restart. 我遇到的问题:如果服务器崩溃,即使自动重启后,消息也停止发送给用户。

2) Sockets should all be in sync together with different cluster processes. 2)套接字应与不同的群集进程一起同步。

How to get zero downtime with … 如何通过……获得零停机时间

You don't. 你不知道

It's simply not possible with anything. 这根本不可能。 You're asking the wrong questions. 您在问错问题。 Try these: 试试这些:

  • How do I catch and handle errors I can predict? 如何捕捉和处理可以预测的错误?
  • How do I gracefully fail when there are errors I cannot predict? 当出现无法预测的错误时,我该如何正常地失败?
  • How can I usefully separate errors in my application vs. errors in how clients interact with it? 我该如何有效地区分应用程序中的错误和客户端如何与之交互的错误?
  • How can I build a distributed system? 如何构建分布式系统?
  • How do I deploy and scale a system with fault tolerance in-mind? 如何在考虑到容错能力的情况下部署和扩展系统?
  • I have [single point of failure XYZ], how do I distribute [XYZ] to remove it? 我有[单点故障XYZ],如何分发[XYZ]以将其删除?
  • What systems monitoring is useful for [some technology]? 哪些系统监视对[某些技术]有用?
  • How do I set up automation for [recurring problem X]? 如何为[重复出现的问题X]设置自动化?

etc. etc. 等等等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM