简体   繁体   English

并行化此Java代码的最佳方法

[英]Best way to parallelize this Java code

How would I go about parallelizing this piece of code with the use of Threads in Java? 我将如何使用Java中的Threads并行化这段代码? It extracts all the contour from an image and creates a new image with only the image contour. 它从图像中提取所有轮廓,并仅使用图像轮廓创建新图像。

import java.io.*;
import java.awt.image.*;
import javax.imageio.ImageIO;
import java.awt.Color;

public class Contornos {

static int h, w;

static float debugTime;

public static void main(String[] args) {
    try {

        File fichImagen = new File("test.jpg");

        BufferedImage image = ImageIO.read(fichImagen);

        w = image.getWidth();
        h = image.getHeight();

        int[] inicial = new int[w * h];

        int[] resultadoR = new int[w * h];
        int[] resultadoG = new int[w * h];
        int[] resultadoB = new int[w * h];

        int[][] procesarR = new int[h][w];
        int[][] procesarG = new int[h][w];
        int[][] procesarB = new int[h][w];

        int[][] procesarBN = new int[h][w];

        int[][] binaria = new int[h][w];

        int[] resultado = new int[w * h];

        image.getRGB(0, 0, w, h, inicial, 0, w);

        for (int i = 0; i < w * h; i++) {
            Color c = new Color(inicial[i]);
            resultadoR[i] = c.getRed();
            resultadoG[i] = c.getGreen();
            resultadoB[i] = c.getBlue();
        }

        int k = 0;
        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {
                procesarR[i][j] = resultadoR[k];
                procesarG[i][j] = resultadoG[k];
                procesarB[i][j] = resultadoB[k];
                k++;
            }
        }

        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {

                procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);

            }
        }


        binaria = extraerContornos(procesarBN);

        k = 0;
        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {
                resultado[k++] = binaria[i][j];
            }
        }

        image.setRGB(0, 0, w, h, resultado, 0, w);
        ImageIO.write(image, "JPG", new File("allJPG.jpg"));

    } catch (IOException e) {
    }

}

static void debugStart() {
    debugTime = System.nanoTime();
}

static void debugEnd() {
    float elapsedTime = System.nanoTime()-debugTime;

    System.out.println( (elapsedTime/1000000) + " ms ");  
}

private static int[][] extraerContornos(int[][] matriz) {
    int modx, mody;

    int[][] sobelx = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}};
    int[][] sobely = {{-1, -2, -1}, {0, 0, 0}, {1, 2, 1}};

    int[][] modg = new int[h][w];
    double[][] theta = new double[h][w];
    int[][] thetanor = new int[h][w];
    int[][] contorno = new int[h][w];

    int umbral = 10;
    int superan = 0, ncontorno = 0;
    double t;
    int signo;
    int uno, dos;

    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {
            if (i == 0 || i == h - 1 || j == 0 || j == w - 1) {
                modg[i][j] = 0;
                theta[i][j] = 0.0;
                thetanor[i][j] = 0;
            } else {
                modx = 0;
                mody = 0;
                for (int k = -1; k <= 1; k++) {
                    for (int l = -1; l <= 1; l++) {
                        modx += matriz[i + k][j + l] * sobelx[k + 1][l + 1];
                        mody += matriz[i + k][j + l] * sobely[k + 1][l + 1];
                    }
                }
                modx = modx / 4;
                mody = mody / 4;

                modg[i][j] = (int) Math.sqrt(modx * modx + mody * mody);

                theta[i][j] = Math.atan2(mody, modx);
                thetanor[i][j] = (int) (theta[i][j] * 256.0 / (2.0 * Math.PI));
            }
        }
    }

    for (int i = 1; i < h - 1; i++) {
        for (int j = 1; j < w - 1; j++) {
            contorno[i][j] = 0;
            if (modg[i][j] >= umbral) {
                superan++;
                t = Math.tan(theta[i][j]);
                if (t >= 0.0) {
                    signo = 1;
                } else {
                    signo = -1;
                }
                if (Math.abs(t) < 1.0) {
                    uno = interpolar(modg[i][j + 1], modg[i - signo][j + 1], t);
                    dos = interpolar(modg[i][j - 1], modg[i + signo][j - 1], t);
                } else {
                    t = 1 / t;
                    uno = interpolar(modg[i - 1][j], modg[i - 1][j + signo], t);
                    dos = interpolar(modg[i + 1][j], modg[i + 1][j - signo], t);
                }
                if (modg[i][j] > uno && modg[i][j] >= dos) {
                    ncontorno++;
                    contorno[i][j] = 255;
                }
            }
        }
    }

    debugEnd();

    return contorno;

}

private static int interpolar(int valor1, int valor2, double tangente) {
    return (int) (valor1 + (valor2 - valor1) * Math.abs(tangente));
}
}

I believe I can use Threads in the extraerContornos method (for the for loops), and join() them at the end to get the results, but that's just my guess. 我相信我可以在extraerContornos方法(用于for循环)中使用Threads,并在最后使用join()获得结果,但这只是我的猜测。

Would that be a correct way to parallelize this? 这是并行化此方法的正确方法吗? Any tips in general on how to know when and where you should start parallelizing any code? 有关如何知道何时何地应该开始并行化任何代码的一般提示?

Tips in general on how to know 有关如何了解的一般提示
when and where you should start parallelizing any code? 您应该在何时何地开始并行化任何代码?

Well, 好,
never ever start parallelizing any code, without having a quantitatively supported evidence, that it will improve system performance. 在没有获得定量支持的证据之前, 绝不会开始并行化任何代码,因为这将提高系统性能。

NEVER EVER, 永远不能,
even if any academicians or wannabe gurus tell you to do so. 即使有院士或想咨询的大师告诉您也这样做。

First collect a fair amount of evidence, that it has any sense at all and how big will be a positive edge such code re-engineering will bring, over an original, pure- [SERIAL] , code-execution flow. 首先,收集大量证据,证明它完全没有意义,并且这种代码重新设计将带来多大的积极优势,从而在原始的纯[SERIAL]代码执行流程。

It is like in nature or like in business -- who will ever pay a single cent more for getting a same result? 就像是自然界或商业界一样-谁会为获得相同的结果付出一分钱呢?

Who will pay X-[man*hours] work at current salary rates for getting just the first 1.01x improvement in performance ( not speaking about wannabe-parallel-gangstas, who manage to deliver even worse than original performance ... because of un-seen before hidden costs of add-on overheads ) -- who will ever pay for this? 谁将以当前的薪水支付X- [man * hours]的工作时间,以仅获得绩效的第一个1.01倍的提高(更不用说想要成为平行表现的帮派了,他们的表现甚至比原始表现还差... -在未计入附加费用的隐性成本之前见过)-谁愿意为此付费?


How to start to analyse possible benefits v/s negative impacts? 如何开始分析可能带来的利益与负面影响?

First of all, try to understand the "mechanics", how can the layered, composite system -- consisting of [ O/S-kernel, programming language, user program ] -- orchestrate going forward using either a "just"- [CONCURRENT] or true- [PARALLEL] process-scheduling. 首先,试图了解“力学”, 怎么能分层,复合体系-由[O / S内核,编程语言,用户程序]的-协调前进使用一个“公正” - [CONCURRENT]或true- [PARALLEL]调度。

Without knowing this, one can never quantify the actual costs of the entry, and sometimes people even pay all such costs without ever realising, that the resulting processing-flow is yet never even at least a "just"- [CONCURRENT] processing ( if one forgets to understand a central "concurrency-preventing-by-exclusive-LOCK-ing" blocking of a python GIL-locking, which could well help mask some sorts of I/O-latencies, but never indeed any kind of improving of a CPU-bound processing-performance, yet all have to pay all those immense costs of spawning full-copies of the process execution-environment + python-internal-state -- all that for receiving nothing at the end. NOTHING. Yes, that bad may things go, if poor or missing knowledge preceded a naive attempt to "go parallelize" activism ). 在不知道这一点的情况下,人们永远无法量化条目的实际成本,有时人们甚至不知不觉地支付了所有此类成本,即由此产生的处理流程甚至从未至少是“公正的”- [CONCURRENT]处理(如果有人忘了理解python GIL锁定的中央“并发阻止-独占锁定”阻止,这很可能会帮助掩盖某些I / O延迟,但实际上并没有任何形式的改进CPU限制的处理性能,但是所有人都必须付出产生进程执行环境+ python-internal-state的完整副本的所有巨大费用-所有这些最终都将一无所获。如果在天真的尝试“平行化”行动主义之前缺乏知识或缺少知识,事情就会过去了。

Ok, once you feel comfortable in operating-system "mechanics" available for spawning threads and processes, you can guesstimate or better benchmark the costs of doing that -- to start working quantitatively -- knowing how many [ns] one will have to pay to spawn a first, second, ... thirtyninth child thread or separate O/S process, or what will be the add-on costs for using some higher-level language constructor, that fans-out a herd of threads/processes, distributes some amount of work and finally collects the heaps of results back to the original requestor ( using just the high-level syntax of .map(...){...} , .foreach(...){...} et al, which on their lower ends do all the dirty job just hidden from the sight of the user-programme designer ( not speaking about "just"-coders, who even do not try to spend any but zero efforts on a fully responsible understanding of the "mechanics" + "economy" of costs of their "just"-coded work ) ). 好的,一旦您对可用于生成线程和进程的操作系统“机制”感到满意,就可以估算或更好地确定这样做的成本( 开始量化工作),知道一个人将要支付多少[ns]生成第一个,第二个,第...个第三十个子线程或单独的O / S进程,或者使用一些高级语言构造函数(扇出一群线程/进程)进行分配的附加成本是多少大量的工作,最后将结果收集回原始请求者(仅使用.map(...){...} .foreach(...){...}的高级语法等人的文章,这些文章的低端部分完成了所有肮脏的工作,只是隐藏在用户程序设计人员的视线之外(不是在谈论“公正”的编码器,他们甚至不花力气去完全负责任地理解)他们的“公正”编码工作的“机械”和“经济”成本))。

Without knowing the actual costs in [ns] ( technically not depicted for clarity and brevity in Fig.1 , that are principally always present, being detailed and discussed in the trailer sections ), it makes almost no sense for anyone to try to read and to try to understand in its full depth and its code-design context the criticism of the Amdahl's Law 不知道[ns]实际成本 (技术上为了清晰和简洁起见, 图1 中始终不存在,在挂车部分进行详细介绍和讨论 ),因此任何人尝试阅读和阅读几乎没有任何意义。试图全面理解其代码设计上下文中对阿姆达尔定律的批评

It is so easy to pay more than one will receive at the end ... 它很容易付款,最终会收到不止一笔...

For more details on this risk, check this and follow the link from the first paragraph, leading to a fully interactive GUI-simulator of the actual costs of overheads, once introduced into the costs/benefits formula. 有关此风险的更多详细信息,请进行检查并单击第一段中的链接,一旦将其引入成本/收益公式中,就可以创建一个完全交互式GUI模拟程序,用于间接费用的实际成本。

Back to your code: 返回您的代码:

Sobel-filter kernel introduces ( naive-)-thread-mapping non-local dependencies, better to start with a way simple section, where an absolute independence is straight visible: Sobel过滤器内核引入了(naive-)-thread-mapping的非本地依赖关系,最好从一个简单的部分开始,在该部分中,绝对独立性是直接可见的:

May save all the repetitive for(){...} -constructor overheads and increase performance: 可以节省所有重复的for(){...}构造函数开销,并提高性能:

    for (     int i = 0; i < h; i++ ) {
        for ( int j = 0; j < w; j++ ) {

            Color c = new Color( inicial[i * w + j] );

            procesarBN[i][j] = (int) ( 0.2989 * c.getRed()
                                     + 0.5870 * c.getGreen()
                                     + 0.1140 * c.getBlue()
                                       );
        }
    }

Instead of these triple-for(){...} -s: 代替这些triple-for(){...} -s:

    for (int i = 0; i < w * h; i++) {
        Color c = new Color(inicial[i]);
        resultadoR[i] = c.getRed();
        resultadoG[i] = c.getGreen();
        resultadoB[i] = c.getBlue();
    }

    int k = 0;
    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {
            procesarR[i][j] = resultadoR[k];
            procesarG[i][j] = resultadoG[k];
            procesarB[i][j] = resultadoB[k];
            k++;
        }
    }

    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {

            procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);

        }
    }

Effects? 效果如何?

In the [SERIAL] -part of the Amdahl's Law: 在阿姆达尔定律的[SERIAL]序列[SERIAL]部分中:

  • at net zero add-on costs : improved / eliminated 2/3 of the for(){...} -constructor looping overhead costs 以零净附加成本计算 :改进/消除了for(){...} 2/ 3-构造函数循环开销成本
  • at net zero add-on costs : improved / eliminated the ( 4 * h * w * 3 ) - memIO ( ie not paying ~ h * w * 1.320+ [us] each !!! ) 以零净附加成本计算 :改进/消除了( 4 * h * w * 3 ) -memIO (即,不付〜h * w * 1.320+ [us]每个!!!)
  • at net zero add-on costs : improved / eliminated the ( 4 * h * w * 3 * 4 ) - memALLOCs , again saving remarkable amount of resources both in [TIME] and [SPACE] , polynomially scaled domains of the complexity ZOO taxonomy. 零附加成本净额 :改进/消除了( 4 * h * w * 3 * 4 ) -memALLOC ,在[TIME][SPACE]再次节省了大量资源,这是复杂性ZOO分类法的多项式缩放域。

and also may feel safe to run these in a [CONCURRENT] processing, as this pixel-value processing is principally independent here ( but not in the Sobel, not in the contour-detector algorithm ). 并且在[CONCURRENT]处理中运行它们也可能会感到安全,因为此处的像素值处理原则上是独立的(但不是在Sobel中,不是在轮廓检测器算法中)。

So, here, 所以在这里,
any [CONCURRENT] or [PARALLEL] process-scheduling may help, if 如果需要, [CONCURRENT][PARALLEL]进程调度都可能会有所帮助

  • at some non-zero add-on cost, the processing gets harnessing multiple computing resources ( more than the 1 CPU-core , that was operated in the original, pure- [SERIAL] , code-execution ), will have been safely pixel-grid mapped onto such ( available resources-supported ) thread-pool or other code-processing facility. 一些非零的附加成本,该处理将利用多个计算资源(超过了1个CPU内核 ,该内核在原始的纯[SERIAL]代码执行中进行了操作)将被安全地像素化。网格映射到此类(可用资源支持的)线程池或其他代码处理工具。

Yet, 然而,
any attempt to go non- [SERIAL] makes sense if and only if the lumpsum of all the process-allocation / deallocation et al add-on costs get at least justified by an increased amount of [CONCURRENT] -ly processed calculations. 当且仅当所有过程分配/取消分配等附加成本的总和至少由增加的[CONCURRENT]处理过的计算量合理时,任何非[SERIAL]尝试才有意义

Paying more than receiving is definitely not a smart move... 比收货多付钱绝对不是明智之举。

So, benchmark, benchmark and benchmark, before deciding what may get positive effect on production code. 因此,先确定基准,基准和基准,然后再决定哪些因素可能对生产代码产生积极影响。

Always try to get improvements in the pure- [SERIAL] sections, as these have zero-add-on costs and yet may reduce the overall processing time. 始终尝试在纯[SERIAL]部分中进行改进,因为它们具有零附加成本,但可能会减少总体处理时间。

QED above. QED以上。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM