I have 700 csv files (of 5mb , 1000 rows and 600 columns) -> call it loadedFile. I have 2 two more csv files, FileA (20mb, 3 columns and 100 000 rows) and FileB (30mb, 2 columns and 100 000 rows).
And the other 700 csv files have been loading in List<String>
using
Files.readAllLines(filePath, StandardCharsets.ISO_8859_1);
Problem statement:
I need to check if, for each loadedFile, its column A exist in column C of FileA, if that is true, Then check, the respective Column B of FileA exist in column A of FileB, Now if that is also true only then, Load the respective Row of loadedFile into byte Array.
Existing code:
public void createByteData(Path filePath, List<String> loadedFiles) {
LOGGER.info("LOADING THE SCENARIO FILE : " + filePath);
for (String loadedFile : loadedFiles) {
String[] loadedFileCoulmns= loadedFile .split(",");
String loadedFileFirstCoulmns = loadedFileCoulmns[0];
//LOGGER.info("LOADING THE ROW FOR SCENARIO FILE : " + filePath);
if(readFileA.containsKey(loadedFileFirstCoulmns )) { //readFileA is Stroing the FileA in private HashMap<String, String> , Column C in Key and Column B in Value
String getColumnB = constructNumtra(readFileA.get(loadedFileFirstCoulmns ));
if (readFileB.contains(getColumnB)) { // readFileB is Stroing the FileB in private HashMap<String, String> , Column B in Key and Column A in Value
//LOGGER.info("INSTRUMENT FOUND IN PORTFOLIO NUMTRA: " + getColumnB);
//To Do : Convert Scenario File to Byte Array
}
}
}
LOGGER.info("Loading Completed for : " + filePath);
}
Also I have luxury to use any Collection for loading the files. I have used Array<List>
and Hashmap here.
Framework and technology information:
The code is working perfect. But its taking lot of time since I have big csv files.
How can I optimize this situation?
One straight forward change you can do is to parallelize loadedFiles processing. Call createByteData for each loadedFile and use executor service to do processing in parallel. Below pseudo code for same
public void createByteData(String loadedFile) {
// Loading one of the file out of 700 files for each row
String[] loadedFileCoulmns = loadedFile.split(","); // Splitting the content of file to get the value of column
// 1
String loadedFileFirstCoulmns = loadedFileCoulmns[0]; // got the value of column to compare
for (Map.Entry<Path, List<String>> readFileA : mtmFiles.entrySet()) { // loading FileA from HashMap
List<String> linesOfFileA = readFileA.getValue(); // get value of hashmap
for (String lineFromFileA : linesOfFileA) { // for each line of FileA
String[] columnOfFileA = lineFromFileA.split(";"); // Splitting the content of file to get the value of
// Column C of FileA
if (loadedFileFirstCoulmns.matches(columnOfFileA[2])) { // Checks if LoadedFile's column 1 value of
// respective row exists in Column C of File A
System.out.println("-----------Inside ------------");
for (Map.Entry<Path, List<String>> readFileB : portfolioFiles.entrySet()) { // loading FileB from
// HashMap
List<String> linesOfFileB = readFileB.getValue(); // get value of hashmap
for (String lineFromFileB : linesOfFileB) { // for each line of FileB
String[] columnOfFileB = lineFromFileB.split(","); // Splitting the content of file to get
// the value of Column 1
if (columnOfFileA[1].equals(columnOfFileB[1])) { // Checks if FileA's column 1 value of
// respective row exists in File B
// Load the row of LoadedFile into byte stream
System.out.println("------------------ Found Match for" + loadedFileCoulmns); // Finally
// load
// the row
// from
// respective
// loaded
// file
// into
// byte
// array
}
}
}
}
}
}
}
/* Driver function to check for above functions */
public static void main(String[] args) throws InterruptedException {
ExecutorService executorService = Executors.newFixedThreadPool(700);
List<String> files = new ArrayList<>();
for (String file : files) {
executorService.submit(createByteData(file));
}
executorService.awaitTermination(100000, TimeUnit.HOURS);
}
What I would go is to use Stram API with it's out of the box parallel processing support. A good explanation can be found here;
https://www.baeldung.com/java-8-parallel-streams-custom-threadpool
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.