简体   繁体   English

在Node.js中并行化任务

[英]Parallelizing tasks in Node.js

I have some tasks I want to do in JS that are resource intensive. 我想在JS中做一些资源密集的任务。 For this question, lets assume they are some heavy calculations, rather then system access. 对于这个问题,我们假设它们是一些繁重的计算,而不是系统访问。 Now I want to run tasks A, B and C at the same time, and executing some function D when this is done. 现在我想同时运行任务A,B和C,并在完成后执行一些功能D.

The async library provides a nice scaffolding for this: 异步库为此提供了一个很好的脚手架:

async.parallel([A, B, C], D);

If what I am doing is just calculations, then this will still run synchronously (unless the library is putting the tasks on different threads itself, which I expect is not the case). 如果我正在做的只是计算,那么它仍然会同步运行(除非库将任务放在不同的线程上,我预计不是这种情况)。 How do I make this be actually parallel? 我该如何让它实际上是平行的? What is the thing done typically by async code to not block the caller (when working with NodeJS)? 通常由异步代码完成的事情是什么,以阻止调用者(使用NodeJS时)? Is it starting a child process ? 它是否开始了儿童过程

How do I make this be actually parallel? 我该如何让它实际上是平行的?

First, you won't really be running in parallel while in a single node application. 首先,在单节点应用程序中,您不会真正并行运行。 A node application runs on a single thread and only one event at a time is processed by node's event loop. 节点应用程序在单个线程上运行,并且节点的事件循环一次只处理一个事件。 Even when running on a multi-core box you won't get parallelism of processing within a node application. 即使在多核盒上运行,您也无法在节点应用程序中获得处理的并行性。

That said, you can get processing parallelism on multicore machine via forking the code into separate node processes or by spawning child process . 也就是说,您可以通过将代码分叉到单独的节点进程通过生成子进程来处理多核机器上的并行性。 This, in effect, allows you to create multiple instances of node itself and to communicate with those processes in different ways (eg stdout, process fork IPC mechanism). 实际上,这允许您创建节点本身的多个实例,并以不同的方式与这些进程通信(例如,stdout,进程分支IPC机制)。 Additionally, you could choose to separate the functions (by responsibility) into their own node app/server and call it via RPC. 此外,您可以选择将功能(通过职责)分离到他们自己的节点应用程序/服务器中,并通过RPC调用它。

What is the thing done typically by async code to not block the caller (when working with NodeJS)? 通常由异步代码完成的事情是什么,以阻止调用者(使用NodeJS时)? Is it starting a child process? 它是否开始了儿童过程?

It is not starting a new process. 它没有开始一个新的过程。 Underneath, when async.parallel is used in node.js , it is using process.nextTick() . 在下面, 当在node.js中使用async.parallel时 ,它正在使用process.nextTick() And nextTick() allows you to avoid blocking the caller by deferring work onto a new stack so you can interleave cpu intensive tasks, etc. 而nextTick()允许您通过将工作推迟到新堆栈来避免阻塞调用者,以便您可以交错cpu密集型任务等。

Long story short 长话短说

Node doesn't make it easy "out of the box" to achieve multiprocessor concurrency. Node不会轻易“开箱即用”来实现多处理器并发。 Node instead gives you a non-blocking design and an event loop that leverages a thread without sharing memory. Node改为为您提供非阻塞设计和利用线程而不共享内存的事件循环。 Multiple threads cannot share data/memory, therefore locks aren't needed. 多个线程无法共享数据/内存,因此不需要锁定。 Node is lock free . 节点是免费的 One node process leverages one thread, and this makes node both safe and powerful. 一个节点进程利用一个线程,这使得节点既安全又强大。

When you need to split work up among multiple processes then use some sort of message passing to communicate with the other processes / servers. 当您需要在多个进程之间拆分工作时,请使用某种消息传递与其他进程/服务器进行通信。 eg IPC/RPC. 例如IPC / RPC。


For more see: 欲了解更多信息:

Awesome answer from SO on What is Node.js ... with tons of goodness. 来自SO的什么是Node.js的惊人答案...有很多善良。

Understanding process.nextTick() 了解process.nextTick()

Asynchronous and parallel are not the same thing. 异步和并行不是一回事。 Asynchronous means that you don't have to wait for synchronization. 异步意味着您不必等待同步。 Parallel means that you can be doing multiple things at the same time. 并行意味着您可以同时执行多项操作。 Node.js is only asynchronous, but its only ever 1 thread. Node.js只是异步的,但它只有1个线程。 It can only work on 1 thing at once. 它一次只能处理1件事。 If you have a long running computation, you should start another process and then just have your node.js process asynchronously wait for results. 如果你有一个长时间运行的计算,你应该启动另一个进程,然后让你的node.js进程异步等待结果。

To do this you could use child_process.spawn and then read data from stdin. 为此,您可以使用child_process.spawn,然后从stdin读取数据。

http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options http://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options

var spawn = require('child_process').spawn;
var process2 = spawn('sh', ['./computationProgram', 'parameter'] );

process2.stderr.on('data', function (data) {
    //handle error input
});

process2.stdout.on('data', function (data) {
    //handle data results
});

Keep in mind I/O is parallelized by Node.js; 请记住,I / O由Node.js并行化; only your JavaScript callbacks are single threaded. 只有你的JavaScript回调是单线程的。

Assuming you are writing a server, an alternative to adding the complexity of spawning processes or forking is to simply build stateless node servers and run an instance per core, or better yet run many instances each in their own virtualized micro server. 假设您正在编写服务器,添加产生进程或分叉的复杂性的替代方法是简单地构建无状态节点服务器并为每个核心运行实例,或者更好地在其自己的虚拟化微服务器中运行多个实例。 Coordinate incoming requests using a reverse proxy or load balancer. 使用反向代理或负载平衡器协调传入请求。

You could also offload computation to another server, maybe MongoDB (using MapReduce) or Hadoop. 您还可以将计算卸载到另一台服务器,可能是MongoDB(使用MapReduce)或Hadoop。

To be truly hardcore, you could write a Node plugin in C++ and have fine-grained control of parallelizing the computation code. 要成为真正的硬核,你可以用C ++编写一个Node插件,并且可以对并行化计算代码进行细粒度的控制。 The speed up from C++ might negate the need of parallelization anyway. 无论如何,从C ++加速可能会否定并行化的需要。

You can always write code to perform computationally intensive tasks in another language best suited for numeric computation, and eg expose them through a REST API. 您总是可以编写代码来执行最适合数值计算的另一种语言的计算密集型任务,例如通过REST API公开它们。

Finally, you could perhaps run the code on the GPU using node-cuda or something similar depending on the type of computation (not all can be optimized for GPU). 最后,您可以使用node-cuda或类似的东西在GPU上运行代码,具体取决于计算类型(并非所有都可以针对GPU进行优化)。

Yes, you can fork and spawn other processes, but it seems to me one of the major advantages of node is to not much have to worry about parallelization and threading, and therefor bypass a great amount of complexity altogether. 是的,您可以派生并生成其他进程,但在我看来,节点的主要优点之一是不必担心并行化和线程化,因此完全绕过了大量的复杂性。

Depending on your use case you can use something like 根据您的使用情况,您可以使用类似的东西

task.js Simplified interface for getting CPU intensive code to run on all cores (node.js, and web) task.js用于获取CPU密集型代码以在所有核心(node.js和web)上运行的简化界面

A example would be 一个例子是

function blocking (exampleArgument) {
    // block thread
}

// turn blocking pure function into a worker task
const blockingAsync = task.wrap(blocking);

// run task on a autoscaling worker pool
blockingAsync('exampleArgumentValue').then(result => {
    // do something with result
});

Just recently came across parallel.js but it seems to be actually using multi-core and also has map reduce type features. 刚刚遇到parallel.js但它似乎实际上使用多核并且还具有map reduce类型功能。 http://adambom.github.io/parallel.js/ http://adambom.github.io/parallel.js/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM