简体   繁体   English

在没有Parallel Processing Toolkit的情况下在MATLAB中对非常大的图像集进行图像处理的并行化

[英]Parallelization of image processing on very large image set in MATLAB without Parallel Processing Toolkit

I have apprxoimately 2,500,000 images to process on a single computer. 我在一台计算机上大约要处理2,500,000张图像。 I currently run sequentially an input image for a single output to my function (it takes ~5 seconds to compute). 目前,我为函数的单个输出按顺序运行输入图像(计算大约需要5秒钟)。 This obviously takes too much time. 这显然需要太多时间。 What other methods can I employ to speed up the process? 我还可以采用哪些其他方法来加快流程? I thought of starting multiple instances of MATLAB and running each on a subset of the data, but I'm not sure if I truly achieve parallelism with this method. 我曾考虑过启动多个MATLAB实例并在数据的子集上运行每个实例,但是我不确定我是否真的通过这种方法实现了并行性。 What are better methods of increasing the overall speed? 什么是提高整体速度的更好方法?

As no-one seems to be helping out, I thought I would have a little attempt and see if I can get you started with some parallelisation. 似乎没有人在帮忙,我想我会尝试一下,看看是否可以让您开始进行并行化。 I do not use Windows or Matlab, so it may need some correcting... so, if anyone knows better... feel free to contribute. 我没有使用Windows或Matlab,因此可能需要进行一些更正...因此,如果有人更了解...,请随时做出贡献。

You can install GNU Parallel under Cygwin in Windows - there are plenty of tutorials and blogs describing the process if you Google them. 您可以在Windows的Cygwin下安装GNU Parallel-如果您使用Google,则有很多教程和博客描述了该过程。

First, I'm guessing/hoping that the following command will process one such image from the command line, so have a little experiment and see if that will work before you move to the next step 首先,我猜测/希望下面的命令将从命令行处理一个这样的图像,因此请进行一些实验,看看是否可以运行,然后再进行下一步

matlab.exe -nodisplay -nosplash -nodesktop -r "run('mfile.m image.jpg');exit;"

Then, to run in parallel, you will need to generate a list of all 2,500,000 JPEGs, so that will look something like this 然后,要并行运行,您将需要生成所有2,500,000个JPEG的列表,因此看起来像这样

DIR /B /S | FINDSTR /I "*JPG$"

and you will need to feed that into GNU Parallel, something like this 并且您需要将其输入到GNU Parallel中,就像这样

DIR /B /S | FINDSTR /I "*JPG$" | parallel matlab.exe -nodisplay -nosplash -nodesktop -r "run('mfile.m {}');exit;"

Obviously test this out on a dummy directory with a copy of a few of your files so that nothing gets broken or overwritten. 显然,请在一个虚拟目录中进行测试,并复制几个文件,以防止损坏或覆盖任何文件。

As @Daniel suggests, there is overhead to starting Matlab, so it would probably be better to change your code to process all images supplied as arguments, then maybe you could pass 2-8 images to each invocation of Matlab, something like this for 4 images per Matlab job: 正如@Daniel所建议的那样,启动Matlab会产生开销,因此最好更改代码以处理作为参数提供的所有图像,然后可能将2-8个图像传递给Matlab的每次调用,例如4每个Matlab作业的图像:

DIR /B /S | FINDSTR /I "*JPG$" | parallel -N 4 matlab.exe -nodisplay -nosplash -nodesktop -r "run('mfile.m {1} {2} {3} {4}');exit;"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM