How to optimize the following code, when using multiple for-loops and if conditions for reading large csv files in Parallel

Question

I have 700 csv files (of 5mb , 1000 rows and 600 columns) -> call it loadedFile. I have 2 two more csv files, FileA (20mb, 3 columns and 100 000 rows) and FileB (30mb, 2 columns and 100 000 rows).

And the other 700 csv files have been loading in List<String> using

Files.readAllLines(filePath, StandardCharsets.ISO_8859_1);

Problem statement:

I need to check if, for each loadedFile, its column A exist in column C of FileA, if that is true, Then check, the respective Column B of FileA exist in column A of FileB, Now if that is also true only then, Load the respective Row of loadedFile into byte Array.

Existing code:

public void createByteData(Path filePath, List<String> loadedFiles) {

    LOGGER.info("LOADING THE SCENARIO FILE : " + filePath);

    for (String loadedFile : loadedFiles) {
        String[] loadedFileCoulmns= loadedFile .split(",");
        String loadedFileFirstCoulmns  = loadedFileCoulmns[0];
        //LOGGER.info("LOADING THE ROW FOR SCENARIO FILE : " + filePath);
            if(readFileA.containsKey(loadedFileFirstCoulmns )) {     //readFileA is Stroing the FileA in private HashMap<String, String> , Column C in Key and Column B in Value

                    String getColumnB = constructNumtra(readFileA.get(loadedFileFirstCoulmns ));
                    if (readFileB.contains(getColumnB)) {    // readFileB is Stroing the FileB in private HashMap<String, String> , Column B in Key and Column A in Value
                        //LOGGER.info("INSTRUMENT FOUND IN PORTFOLIO NUMTRA: " + getColumnB);
                        //To Do : Convert Scenario File to Byte Array                
                    }
                }
    }
    LOGGER.info("Loading Completed for : " + filePath);
}

Also I have luxury to use any Collection for loading the files. I have used Array<List> and Hashmap here.

Framework and technology information:

Springboot
Multithreading - loadedFiles are getting loaded in Parallel using Java 8 Parallel Streams,
Java 8.

The code is working perfect. But its taking lot of time since I have big csv files.

How can I optimize this situation?

Answer 1

One straight forward change you can do is to parallelize loadedFiles processing. Call createByteData for each loadedFile and use executor service to do processing in parallel. Below pseudo code for same

public void createByteData(String loadedFile) {

        // Loading one of the file out of 700 files for each row
        String[] loadedFileCoulmns = loadedFile.split(","); // Splitting the content of file to get the value of column
                                                            // 1
        String loadedFileFirstCoulmns = loadedFileCoulmns[0]; // got the value of column to compare

        for (Map.Entry<Path, List<String>> readFileA : mtmFiles.entrySet()) { // loading FileA from HashMap
            List<String> linesOfFileA = readFileA.getValue(); // get value of hashmap

            for (String lineFromFileA : linesOfFileA) { // for each line of FileA
                String[] columnOfFileA = lineFromFileA.split(";"); // Splitting the content of file to get the value of
                                                                   // Column C of FileA

                if (loadedFileFirstCoulmns.matches(columnOfFileA[2])) { // Checks if LoadedFile's column 1 value of
                                                                        // respective row exists in Column C of File A

                    System.out.println("-----------Inside ------------");
                    for (Map.Entry<Path, List<String>> readFileB : portfolioFiles.entrySet()) { // loading FileB from
                                                                                                // HashMap
                        List<String> linesOfFileB = readFileB.getValue(); // get value of hashmap

                        for (String lineFromFileB : linesOfFileB) { // for each line of FileB
                            String[] columnOfFileB = lineFromFileB.split(","); // Splitting the content of file to get
                                                                               // the value of Column 1
                            if (columnOfFileA[1].equals(columnOfFileB[1])) { // Checks if FileA's column 1 value of
                                                                             // respective row exists in File B

                                // Load the row of LoadedFile into byte stream
                                System.out.println("------------------ Found Match for" + loadedFileCoulmns); // Finally
                                                                                                              // load
                                                                                                              // the row
                                                                                                              // from
                                                                                                              // respective
                                                                                                              // loaded
                                                                                                              // file
                                                                                                              // into
                                                                                                              // byte
                                                                                                              // array

                            }
                        }
                    }
                }
            }
        }

    }

    /* Driver function to check for above functions */
    public static void main(String[] args) throws InterruptedException {
        ExecutorService executorService = Executors.newFixedThreadPool(700);
        List<String> files = new ArrayList<>();
        for (String file : files) {
            executorService.submit(createByteData(file));
        }

        executorService.awaitTermination(100000, TimeUnit.HOURS);
    }

Answer 2

What I would go is to use Stram API with it's out of the box parallel processing support. A good explanation can be found here;

https://www.baeldung.com/java-8-parallel-streams-custom-threadpool

How to optimize the following code, when using multiple for-loops and if conditions for reading large csv files in Parallel

Question

2 answers

solution1
1 2019-01-28 11:42:29

solution2
0 2019-01-28 12:00:23

How to optimize the following code, when using multiple for-loops and if conditions for reading large csv files in Parallel

Question

2 answers

solution1 1 2019-01-28 11:42:29

solution2 0 2019-01-28 12:00:23

solution1
1 2019-01-28 11:42:29

solution2
0 2019-01-28 12:00:23