简体   繁体   English

在node.js中,如何声明一个可以被master进程初始化并被worker进程访问的共享变量?

[英]In node.js, how to declare a shared variable that can be initialized by master process and accessed by worker processes?

I want the following我想要以下

  • During startup, the master process loads a large table from file and saves it into a shared variable.在启动期间,主进程从文件中加载一个大表并将其保存到一个共享变量中。 The table has 9 columns and 12 million rows, 432MB in size.该表有 9 列和 1200 万行,大小为 432MB。
  • The worker processes run HTTP server, accepting real-time queries against the large table.工作进程运行 HTTP 服务器,接受对大表的实时查询。

Here is my code, which obviously does not achieve my goal.这是我的代码,显然没有达到我的目标。

var my_shared_var;
var cluster = require('cluster');
var numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // Load a large table from file and save it into my_shared_var,
  // hoping the worker processes can access to this shared variable,
  // so that the worker processes do not need to reload the table from file.
  // The loading typically takes 15 seconds.
  my_shared_var = load('path_to_my_large_table');

  // Fork worker processes
  for (var i = 0; i < numCPUs; i++) {
    cluster.fork();
  }
} else {
  // The following line of code actually outputs "undefined".
  // It seems each process has its own copy of my_shared_var.
  console.log(my_shared_var);

  // Then perform query against my_shared_var.
  // The query should be performed by worker processes,
  // otherwise the master process will become bottleneck
  var result = query(my_shared_var);
}

I have tried saving the large table into MongoDB so that each process can easily access to the data.我尝试将大表保存到 MongoDB 中,以便每个进程都可以轻松访问数据。 But the table size is so huge that it takes MongoDB about 10 seconds to complete my query even with an index.但是表太大了,即使有索引,MongoDB 也需要大约 10 秒才能完成我的查询。 This is too slow and not acceptable for my real-time application.这太慢了,对于我的实时应用程序来说是不可接受的。 I have also tried Redis, which holds data in memory.我也尝试过 Redis,它将数据保存在内存中。 But Redis is a key-value store and my data is a table.但是 Redis 是一个键值存储,我的数据是一个表。 I also wrote a C++ program to load the data into memory, and the query took less than 1 second, so I want to emulate this in node.js.我还写了一个 C++ 程序将数据加载到内存中,查询时间不到 1 秒,所以我想在 node.js 中进行模拟。

If I translate your question in a few words, you need to share data of MASTER entity with WORKER entity. 如果我用几句话翻译您的问题,您需要与WORKER实体共享MASTER实体的数据。 It can be done very easily using events: 使用事件可以非常轻松地完成:

From Master to worker: 从师父到工人:

worker.send({json data});    // In Master part

process.on('message', yourCallbackFunc(jsonData));    // In Worker part

From Worker to Master: 从工人到大师:

process.send({json data});   // In Worker part

worker.on('message', yourCallbackFunc(jsonData));    // In Master part

I hope this way you can send and receive data bidirectionally. 我希望通过这种方式可以双向发送和接收数据。 Please mark it as answer if you find it useful so that other users can also find the answer. 如果您发现它有用,请将其标记为答案,以便其他用户也可以找到答案。 Thanks 谢谢

You are looking for shared memory, which node.js just does not support . 您正在寻找共享内存, node.js不支持 You should look for alternatives, such as querying a database or using memcached . 您应该寻找替代方案,例如查询数据库或使用memcached

If read-only access is fine for your application, try out my own shared memory module . 如果只读访问适用于您的应用程序,请尝试使用我自己的共享内存模块 It uses mmap under the covers, so data is loaded as it's accessed and not all at once. 它使用了mmap ,因此数据在访问时加载,而不是一次性加载。 The memory is shared among all processes on the machine. 内存在机器上的所有进程之间共享。 Using it is super easy: 使用它非常简单:

const Shared = require('mmap-object')

const shared_object = new Shared.Open('table_file')

console.log(shared_object.property)

It gives you a regular object interface to a key-value store of strings or numbers. 它为您提供了一个到字符串或数字的键值存储的常规对象接口。 It's super fast in my applications. 它在我的应用程序中超级快。

There is also an experimental read-write version of the module available for testing. 还有一个可用于测试的模块实验性读写版本

In node.js fork works not like in C++. 在node.js中,fork的工作方式与C ++不同。 It's not copy current state of process, it's run new process. 它不是复制当前的进程状态,而是运行新进程。 So, in this case variables isn't shared. 因此,在这种情况下,不共享变量。 Every line of code works for every process but master process have cluster.isMaster flag set to true. 每行代码都适用于每个进程,但主进程将cluster.isMaster标志设置为true。 You need to load your data for every worker processes. 您需要为每个工作进程加载数据。 Be careful if your data is really huge because every process will have its own copy. 如果您的数据非常庞大,请小心,因为每个进程都有自己的副本。 I think you need to query parts of data as soon as you need them or wait if you realy need it all in memory. 我认为您需要在需要时立即查询部分数据,或者如果您真的需要在内存中使用它,请等待。

You can use Redis. 你可以使用Redis。

Redis is an open source, BSD licensed, advanced key-value cache and store. Redis是一个开源的,BSD许可的,高级键值缓存和存储。 It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets, sorted sets, bitmaps and hyperloglogs. 它通常被称为数据结构服务器,因为密钥可以包含字符串,散列,列表,集,排序集,位图和超级日志。

redis.io redis.io

This way works to "share a variable";这种方式可以“共享变量”; it is a bit more fancy than the way @Shivam did present.它比@Shivam 呈现的方式更花哨。 However, the module internally uses the same API.但是,该模块在内部使用相同的 API。 Therefore "shared memory" is a bit misleading as in cluster each process is a fork of the parent process.因此“共享内存”有点误导,因为在集群中每个进程都是父进程的一个分支。 At fork time, process memory is duplicated in OS memory.在 fork 时,进程内存在 OS 内存中复制。 Therefore there is no real shared memory except low-level shared memory like shm device or virtual shared memory page (Windows).因此,除了像 shm 设备或虚拟共享内存页面(Windows)这样的低级共享内存之外,没有真正的共享内存。 I did implement a native module for Node.js which does make use of native shared memory (which is real shared memory) as using this technique both process read directly from a OS shared memory section.我确实为 Node.js 实现了一个本机模块,它使用本机共享内存(这是真正的共享内存),因为使用这种技术,两个进程都直接从操作系统共享内存部分读取。 However, this solution doesn't really apply here well because it is limited to scalar values.但是,此解决方案在这里并不适用,因为它仅限于标量值。 You could of course JSON.stringify and share the JSON serialized data string, but the time it consumes to parse/stringify is totally non-ideal for most use cases.您当然可以 JSON.stringify 并共享 JSON 序列化数据字符串,但解析/字符串化所消耗的时间对于大多数用例来说完全不理想。 (Especially for larger objects parsing/stringifying of JSON with standard library implementations becomes non-linear). (特别是对于较大的对象,使用标准库实现对 JSON 的解析/字符串化变得非线性)。

Thus, this solutions seems the most promising for now:因此,这个解决方案目前似乎是最有前途的:

const cluster = require('cluster');
require('cluster-shared-memory');

if (cluster.isMaster) {
  for (let i = 0; i < 2; i++) {
    cluster.fork();
  }
} else {
  const sharedMemoryController = require('cluster-shared-memory');
  // Note: it must be a serializable object
  const obj = {
    name: 'Tom',
    age: 10,
  };
  // Set an object
  await sharedMemoryController.set('myObj', obj);
  // Get an object
  const myObj = await sharedMemoryController.get('myObj');
  // Mutually exclusive access
  await sharedMemoryController.mutex('myObj', async () => {
    const newObj = await sharedMemoryController.get('myObj');
    newObj.age = newObj.age + 1;
    await sharedMemoryController.set('myObj', newObj);
  });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM