简体   繁体   中英

Best way to parallelize this Java code

How would I go about parallelizing this piece of code with the use of Threads in Java? It extracts all the contour from an image and creates a new image with only the image contour.

import java.io.*;
import java.awt.image.*;
import javax.imageio.ImageIO;
import java.awt.Color;

public class Contornos {

static int h, w;

static float debugTime;

public static void main(String[] args) {
    try {

        File fichImagen = new File("test.jpg");

        BufferedImage image = ImageIO.read(fichImagen);

        w = image.getWidth();
        h = image.getHeight();

        int[] inicial = new int[w * h];

        int[] resultadoR = new int[w * h];
        int[] resultadoG = new int[w * h];
        int[] resultadoB = new int[w * h];

        int[][] procesarR = new int[h][w];
        int[][] procesarG = new int[h][w];
        int[][] procesarB = new int[h][w];

        int[][] procesarBN = new int[h][w];

        int[][] binaria = new int[h][w];

        int[] resultado = new int[w * h];

        image.getRGB(0, 0, w, h, inicial, 0, w);

        for (int i = 0; i < w * h; i++) {
            Color c = new Color(inicial[i]);
            resultadoR[i] = c.getRed();
            resultadoG[i] = c.getGreen();
            resultadoB[i] = c.getBlue();
        }

        int k = 0;
        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {
                procesarR[i][j] = resultadoR[k];
                procesarG[i][j] = resultadoG[k];
                procesarB[i][j] = resultadoB[k];
                k++;
            }
        }

        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {

                procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);

            }
        }


        binaria = extraerContornos(procesarBN);

        k = 0;
        for (int i = 0; i < h; i++) {
            for (int j = 0; j < w; j++) {
                resultado[k++] = binaria[i][j];
            }
        }

        image.setRGB(0, 0, w, h, resultado, 0, w);
        ImageIO.write(image, "JPG", new File("allJPG.jpg"));

    } catch (IOException e) {
    }

}

static void debugStart() {
    debugTime = System.nanoTime();
}

static void debugEnd() {
    float elapsedTime = System.nanoTime()-debugTime;

    System.out.println( (elapsedTime/1000000) + " ms ");  
}

private static int[][] extraerContornos(int[][] matriz) {
    int modx, mody;

    int[][] sobelx = {{-1, 0, 1}, {-2, 0, 2}, {-1, 0, 1}};
    int[][] sobely = {{-1, -2, -1}, {0, 0, 0}, {1, 2, 1}};

    int[][] modg = new int[h][w];
    double[][] theta = new double[h][w];
    int[][] thetanor = new int[h][w];
    int[][] contorno = new int[h][w];

    int umbral = 10;
    int superan = 0, ncontorno = 0;
    double t;
    int signo;
    int uno, dos;

    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {
            if (i == 0 || i == h - 1 || j == 0 || j == w - 1) {
                modg[i][j] = 0;
                theta[i][j] = 0.0;
                thetanor[i][j] = 0;
            } else {
                modx = 0;
                mody = 0;
                for (int k = -1; k <= 1; k++) {
                    for (int l = -1; l <= 1; l++) {
                        modx += matriz[i + k][j + l] * sobelx[k + 1][l + 1];
                        mody += matriz[i + k][j + l] * sobely[k + 1][l + 1];
                    }
                }
                modx = modx / 4;
                mody = mody / 4;

                modg[i][j] = (int) Math.sqrt(modx * modx + mody * mody);

                theta[i][j] = Math.atan2(mody, modx);
                thetanor[i][j] = (int) (theta[i][j] * 256.0 / (2.0 * Math.PI));
            }
        }
    }

    for (int i = 1; i < h - 1; i++) {
        for (int j = 1; j < w - 1; j++) {
            contorno[i][j] = 0;
            if (modg[i][j] >= umbral) {
                superan++;
                t = Math.tan(theta[i][j]);
                if (t >= 0.0) {
                    signo = 1;
                } else {
                    signo = -1;
                }
                if (Math.abs(t) < 1.0) {
                    uno = interpolar(modg[i][j + 1], modg[i - signo][j + 1], t);
                    dos = interpolar(modg[i][j - 1], modg[i + signo][j - 1], t);
                } else {
                    t = 1 / t;
                    uno = interpolar(modg[i - 1][j], modg[i - 1][j + signo], t);
                    dos = interpolar(modg[i + 1][j], modg[i + 1][j - signo], t);
                }
                if (modg[i][j] > uno && modg[i][j] >= dos) {
                    ncontorno++;
                    contorno[i][j] = 255;
                }
            }
        }
    }

    debugEnd();

    return contorno;

}

private static int interpolar(int valor1, int valor2, double tangente) {
    return (int) (valor1 + (valor2 - valor1) * Math.abs(tangente));
}
}

I believe I can use Threads in the extraerContornos method (for the for loops), and join() them at the end to get the results, but that's just my guess.

Would that be a correct way to parallelize this? Any tips in general on how to know when and where you should start parallelizing any code?

Tips in general on how to know
when and where you should start parallelizing any code?

Well,
never ever start parallelizing any code, without having a quantitatively supported evidence, that it will improve system performance.

NEVER EVER,
even if any academicians or wannabe gurus tell you to do so.

First collect a fair amount of evidence, that it has any sense at all and how big will be a positive edge such code re-engineering will bring, over an original, pure- [SERIAL] , code-execution flow.

It is like in nature or like in business -- who will ever pay a single cent more for getting a same result?

Who will pay X-[man*hours] work at current salary rates for getting just the first 1.01x improvement in performance ( not speaking about wannabe-parallel-gangstas, who manage to deliver even worse than original performance ... because of un-seen before hidden costs of add-on overheads ) -- who will ever pay for this?


How to start to analyse possible benefits v/s negative impacts?

First of all, try to understand the "mechanics", how can the layered, composite system -- consisting of [ O/S-kernel, programming language, user program ] -- orchestrate going forward using either a "just"- [CONCURRENT] or true- [PARALLEL] process-scheduling.

Without knowing this, one can never quantify the actual costs of the entry, and sometimes people even pay all such costs without ever realising, that the resulting processing-flow is yet never even at least a "just"- [CONCURRENT] processing ( if one forgets to understand a central "concurrency-preventing-by-exclusive-LOCK-ing" blocking of a python GIL-locking, which could well help mask some sorts of I/O-latencies, but never indeed any kind of improving of a CPU-bound processing-performance, yet all have to pay all those immense costs of spawning full-copies of the process execution-environment + python-internal-state -- all that for receiving nothing at the end. NOTHING. Yes, that bad may things go, if poor or missing knowledge preceded a naive attempt to "go parallelize" activism ).

Ok, once you feel comfortable in operating-system "mechanics" available for spawning threads and processes, you can guesstimate or better benchmark the costs of doing that -- to start working quantitatively -- knowing how many [ns] one will have to pay to spawn a first, second, ... thirtyninth child thread or separate O/S process, or what will be the add-on costs for using some higher-level language constructor, that fans-out a herd of threads/processes, distributes some amount of work and finally collects the heaps of results back to the original requestor ( using just the high-level syntax of .map(...){...} , .foreach(...){...} et al, which on their lower ends do all the dirty job just hidden from the sight of the user-programme designer ( not speaking about "just"-coders, who even do not try to spend any but zero efforts on a fully responsible understanding of the "mechanics" + "economy" of costs of their "just"-coded work ) ).

Without knowing the actual costs in [ns] ( technically not depicted for clarity and brevity in Fig.1 , that are principally always present, being detailed and discussed in the trailer sections ), it makes almost no sense for anyone to try to read and to try to understand in its full depth and its code-design context the criticism of the Amdahl's Law

It is so easy to pay more than one will receive at the end ...

For more details on this risk, check this and follow the link from the first paragraph, leading to a fully interactive GUI-simulator of the actual costs of overheads, once introduced into the costs/benefits formula.

Back to your code:

Sobel-filter kernel introduces ( naive-)-thread-mapping non-local dependencies, better to start with a way simple section, where an absolute independence is straight visible:

May save all the repetitive for(){...} -constructor overheads and increase performance:

    for (     int i = 0; i < h; i++ ) {
        for ( int j = 0; j < w; j++ ) {

            Color c = new Color( inicial[i * w + j] );

            procesarBN[i][j] = (int) ( 0.2989 * c.getRed()
                                     + 0.5870 * c.getGreen()
                                     + 0.1140 * c.getBlue()
                                       );
        }
    }

Instead of these triple-for(){...} -s:

    for (int i = 0; i < w * h; i++) {
        Color c = new Color(inicial[i]);
        resultadoR[i] = c.getRed();
        resultadoG[i] = c.getGreen();
        resultadoB[i] = c.getBlue();
    }

    int k = 0;
    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {
            procesarR[i][j] = resultadoR[k];
            procesarG[i][j] = resultadoG[k];
            procesarB[i][j] = resultadoB[k];
            k++;
        }
    }

    for (int i = 0; i < h; i++) {
        for (int j = 0; j < w; j++) {

            procesarBN[i][j] = (int) (0.2989 * procesarR[i][j] + 0.5870 * procesarG[i][j] + 0.1140 * procesarB[i][j]);

        }
    }

Effects?

In the [SERIAL] -part of the Amdahl's Law:

  • at net zero add-on costs : improved / eliminated 2/3 of the for(){...} -constructor looping overhead costs
  • at net zero add-on costs : improved / eliminated the ( 4 * h * w * 3 ) - memIO ( ie not paying ~ h * w * 1.320+ [us] each !!! )
  • at net zero add-on costs : improved / eliminated the ( 4 * h * w * 3 * 4 ) - memALLOCs , again saving remarkable amount of resources both in [TIME] and [SPACE] , polynomially scaled domains of the complexity ZOO taxonomy.

and also may feel safe to run these in a [CONCURRENT] processing, as this pixel-value processing is principally independent here ( but not in the Sobel, not in the contour-detector algorithm ).

So, here,
any [CONCURRENT] or [PARALLEL] process-scheduling may help, if

  • at some non-zero add-on cost, the processing gets harnessing multiple computing resources ( more than the 1 CPU-core , that was operated in the original, pure- [SERIAL] , code-execution ), will have been safely pixel-grid mapped onto such ( available resources-supported ) thread-pool or other code-processing facility.

Yet,
any attempt to go non- [SERIAL] makes sense if and only if the lumpsum of all the process-allocation / deallocation et al add-on costs get at least justified by an increased amount of [CONCURRENT] -ly processed calculations.

Paying more than receiving is definitely not a smart move...

So, benchmark, benchmark and benchmark, before deciding what may get positive effect on production code.

Always try to get improvements in the pure- [SERIAL] sections, as these have zero-add-on costs and yet may reduce the overall processing time.

QED above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM