[英]Want to compare two Lists of records, save commons to a new list ,Records are around 1M and taking a lot of time to process
I'm processing 2 csv files and checking common entries and saving them into a new csv file .however the comparison is taking a lot of time.My approach is to first read all the data from files into ArrayList then using parallelStream over main list, i do comparison on the other list and append the common entries with a string builder which will then be saved to the new csv file.我正在处理 2 个 csv 文件并检查常见条目并将它们保存到一个新的 csv 文件中。但是比较需要很多时间。我的方法是首先将文件中的所有数据读入 ArrayList,然后在主列表上使用 parallelStream,我对另一个列表进行比较,并使用字符串生成器附加常见条目,然后将其保存到新的 csv 文件中。 Below is my code for this.下面是我的代码。
allReconFileLines.parallelStream().forEach(baseLine -> {
String[] baseLineSplitted = baseLine.split(",|,,");
if (baseLineSplitted != null && baseLineSplitted.length >= 13 && baseLineSplitted[13].trim().equalsIgnoreCase("#N/A")) {
for (int i = 0; i < allCompleteFileLines.size(); i++) {
String complteFileLine = allCompleteFileLines.get(i);
String[] reconLineSplitted = complteFileLine.split(",|,,");
if (reconLineSplitted != null && reconLineSplitted[3].replaceAll("^\"|\"$", "").trim().equals(baseLineSplitted[3].replaceAll("^\"|\"$", "").trim())) {
//pw.write(complteFileLine);
matchedLines.append(complteFileLine);
break;
}
}
}
});
pw.write(matchedLines.toString());
Currently it is taking hours to process.目前,处理需要几个小时。 How can i make it quick ?我怎样才能让它快?
Read the keys of one file into eg a HashSet
, and then as you're reading the second file, for each line check if it's in the set and if so write it out.将一个文件的键读入例如HashSet
,然后在读取第二个文件时,检查每一行是否在集合中,如果是,则将其写出。 This way you only need enough memory to keep the keys of one file.这样你只需要足够的内存来保存一个文件的密钥。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.