简体   繁体   English

nodejs中zip的多线程

[英]Multi-threading for zip in nodejs

Can zip and unzip operation be made-multithreaded in nodejs? zip 和解压缩操作可以在 nodejs 中进行多线程处理吗?

There are a bunch of modules like yauzl, but neither uses multiple threads, and you can't start multiple threads yourself with node-cluster or something like that, because each zip file must be handled in a single thread有一堆像yauzl这样的模块,但都没有使用多线程,而且你不能用node-cluster或类似的东西自己启动多个线程,因为每个zip文件必须在单个线程中处理

According to Zlib documentation根据 Zlib 文档

Threadpool Usage: All zlib APIs, except those that are explicitly synchronous, use libuv's threadpool.线程池使用:所有 zlib API,除了那些显式同步的,都使用 libuv 的线程池。 This can lead to surprising effects in some applications, such as subpar performance (which can be mitigated by adjusting the pool size) and/or unrecoverable and catastrophic memory fragmentation.这可能会在某些应用程序中导致令人惊讶的影响,例如低于标准的性能(可以通过调整池大小来缓解)和/或不可恢复和灾难性的 memory 碎片。 https://nodejs.org/api/zlib.html#zlib_threadpool_usage https://nodejs.org/api/zlib.html#zlib_threadpool_usage

According to libuv's threadpool you can change the environment variable UV_THREADPOOL_SIZE to change the maximum size根据libuv的线程池可以改变环境变量UV_THREADPOOL_SIZE来改变最大大小

If you instead wish to be compressing many small files at the same time you can use Worker Threads https://nodejs.org/api/worker_threads.html如果您希望同时压缩许多小文件,您可以使用工作线程https://nodejs.org/api/worker_threads.html

On reading your question again it seems like you want multiple files.再次阅读您的问题时,您似乎想要多个文件。 Use Worker Threads, these will not block your main thread and you can get the output back from them via promises.使用工作线程,这些不会阻塞你的主线程,你可以通过承诺从他们那里取回 output。

Node JS uses Libuv and worker thread. Node JS 使用 Libuv 和工作线程。 Worker thread is a way to do operation in multi-threaded manner.工作线程是一种以多线程方式进行操作的方式。 While by using libuv (it maintains thread in thread pool) you can increase thread of default node js server.通过使用 libuv(它在线程池中维护线程),您可以增加默认节点 js 服务器的线程。 You can use both to improve node js performance for your operation.您可以同时使用这两种方法来提高您的操作的节点 js 性能。

So here is official documentation for worker thread: https://nodejs.org/api/worker_threads.html所以这里是工作线程的官方文档: https://nodejs.org/api/worker_threads.html

See how you can increase thread pool in node js here: print libuv threadpool size in node js 8在此处查看如何增加节点 js 中的线程池: 在节点 js 8 中打印 libuv 线程池大小

Can zip and unzip operation be made-multithreaded in nodejs? zip 和解压缩操作可以在 nodejs 中进行多线程处理吗?

Yes.是的。

...and you can't start multiple threads yourself... because each zip file must be handled in a single thread ...而且您不能自己启动多个线程...因为每个 zip 文件必须在单个线程中处理

I suspect your premise is faulty.我怀疑你的前提是错误的。 Why exactly do you think a node process cannot start multiple threads?为什么你认为一个节点进程不能启动多个线程? Here is an app I'm running which is using the very mature node.js cluster module with a parent process acting as a supervisor and two child processes doing heavily network and disk I/O bound tasks.这是我正在运行的一个应用程序,它使用非常成熟的node.js 集群模块,其中一个父进程充当主管,两个子进程执行大量网络和磁盘 I/O 绑定任务。

显示使用 CPU 线程的 node.js 进程的顶部输出

As you can see in the C column, each process is running on a separate thread.正如您在C列中看到的那样,每个进程都在单独的线程上运行。 This lets the master process remain responsive for command and control tasks (like spawning/reaping workers) while the worker processes are CPU or disk bound.这让主进程保持对命令和控制任务(如产生/收获工人)的响应,而工作进程受 CPU 或磁盘限制。 This particular server accepts files from the network, sometimes decompresses them, and feeds them through external file processors.这个特定的服务器接受来自网络的文件,有时会解压缩它们,并通过外部文件处理器提供它们。 IOW, its a task that includes compression like you describe. IOW,它是一项包括您所描述的压缩的任务。

I'm not sure you'd want to use worker threads based on this snippet from the docs :我不确定您是否要根据文档中的此代码段使用工作线程:

Workers (threads) are useful for performing CPU-intensive JavaScript operations.工作线程(线程)对于执行 CPU 密集型 JavaScript 操作很有用。 They will not help much with I/O-intensive work.它们对 I/O 密集型工作没有多大帮助。 Node.js's built-in asynchronous I/O operations are more efficient than Workers can be. Node.js 内置的异步 I/O 操作比 Workers 效率更高。

To me, that description screams, "crypo."对我来说,这个描述尖叫,“crypo”。 In the past I've spawned child processes when having to perform any expensive crypo operations.过去,当我不得不执行任何昂贵的密码操作时,我会产生子进程。

In another project I use node's child_process module and kick off a new child process each time I have a batch of files to compress.在另一个项目中,我使用节点的child_process模块,并在每次我有一批要压缩的文件时启动一个新的子进程。 That particular service sees a list of ~400 files with names like process-me-2019.11.DD.MM and concatenates them into a single process-me-2019-11-DD file.该特定服务会看到约 400 个文件的列表,其名称类似于process-me-2019.11.DD.MM并将它们连接到单个process-me-2019-11-DD文件中。 It takes a while to compress so spawning a new process avoids blocking on the main thread.压缩需要一段时间,因此生成一个新进程可以避免阻塞主线程。

Help for how to do multi-threading in node js.关于如何在 node js 中进行多线程的帮助 You will have to create below three file您将必须创建以下三个文件

index.mjs索引.mjs

import run from './Worker.mjs';

/**
* design your input list of zip files here and send them to `run` one file name at a time
* to zip, using a loop or something. It acts as promise.
* exmaple : run( <your_input> ).then( <your_output> );
**/

Worker.mjs工人.mjs

import { Worker } from 'worker_threads';

function runService(id, options) {
    return new Promise((resolve, reject) => {
        const worker = new Worker('./src/WorkerService.mjs', { workerData: { <your_input> } });
        worker.on('message', res => resolve({ res: res, threadId: worker.threadId }));
        worker.on('error', reject);
        worker.on('exit', code => {
            if (code !== 0)
                reject(new Error(`Worker stopped with exit code ${code}`));
        });
    });
}

async function run(id, options) {
    return await runService(id, options);
}

export default run;

WorkerService.mjs工人服务.mjs

import { workerData } from 'worker_threads';

// Here goes your logic for zipping a file, where as `workerData` will have <your_input>.

Let me know if it helps.让我知道它是否有帮助。

There is no way you can do multi-threading in pure Nodejs until you use any third-party library.除非您使用任何第三方库,否则您无法在纯 Nodejs 中执行多线程。 You can execute the process in parallel using promises.您可以使用 Promise 并行执行该过程。 If you don't want to overload the main thread which node uses then you can implement RabitMQ (Redis Queue).如果您不想重载节点使用的主线程,那么您可以实现 RabitMQ(Redis 队列)。 It will run in its own thread so your main thread will never be blocked.它将在自己的线程中运行,因此您的主线程永远不会被阻塞。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM