简体   繁体   English

想要比较两个记录列表,将commons保存到一个新列表,记录在1M左右,需要很多时间来处理

[英]Want to compare two Lists of records, save commons to a new list ,Records are around 1M and taking a lot of time to process

I'm processing 2 csv files and checking common entries and saving them into a new csv file .however the comparison is taking a lot of time.My approach is to first read all the data from files into ArrayList then using parallelStream over main list, i do comparison on the other list and append the common entries with a string builder which will then be saved to the new csv file.我正在处理 2 个 csv 文件并检查常见条目并将它们保存到一个新的 csv 文件中。但是比较需要很多时间。我的方法是首先将文件中的所有数据读入 ArrayList,然后在主列表上使用 parallelStream,我对另一个列表进行比较,并使用字符串生成器附加常见条目,然后将其保存到新的 csv 文件中。 Below is my code for this.下面是我的代码。

allReconFileLines.parallelStream().forEach(baseLine -> {

            String[] baseLineSplitted = baseLine.split(",|,,");
            if (baseLineSplitted != null && baseLineSplitted.length >= 13 && baseLineSplitted[13].trim().equalsIgnoreCase("#N/A")) {
                for (int i = 0; i < allCompleteFileLines.size(); i++) {
                    String complteFileLine = allCompleteFileLines.get(i);
                    String[] reconLineSplitted = complteFileLine.split(",|,,");
                    if (reconLineSplitted != null && reconLineSplitted[3].replaceAll("^\"|\"$", "").trim().equals(baseLineSplitted[3].replaceAll("^\"|\"$", "").trim())) {
                        //pw.write(complteFileLine);
                        matchedLines.append(complteFileLine);
                       
                        break;
                    }
                }
            }
        });
   pw.write(matchedLines.toString());

Currently it is taking hours to process.目前,处理需要几个小时。 How can i make it quick ?我怎样才能让它快?

Read the keys of one file into eg a HashSet , and then as you're reading the second file, for each line check if it's in the set and if so write it out.将一个文件的键读入例如HashSet ,然后在读取第二个文件时,检查每一行是否在集合中,如果是,则将其写出。 This way you only need enough memory to keep the keys of one file.这样你只需要足够的内存来保存一个文件的密钥。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用2000万条记录进行Lucene索引需要花费更多时间 - Lucene Indexing with 20 M Records taking more time 保存大量对象需要花费大量时间 - Saving huge lists of objects taking lot of time 比较 2 个列表并确定具有唯一列数据的记录 - Compare 2 lists and determine the records with unique column data java 比较 java 中两种不同类型列表中相同 id 的两条记录 - java compare two records same id in two different types of lists in java 使用apache commons csv:我正在尝试从列表中删除一组记录 <CSVRecord> 但这并不是全部删除/ - Using apache commons csv: I'm trying to remove a set of records from a List<CSVRecord> but it's not removing all of them/ 休眠查询来获取记录需要很长时间 - Hibernate query to fetch records taking much time 为什么在多线程环境中使用虚拟记录填充数组列表需要双倍的时间? - Why populating array list with dummy records is taking double the time in multi-threaded environment? 使用HQL比较两个表中的记录 - Compare records from two tables using HQL 大约有100万条记录时如何在Java中比较Hive和Cassandra数据 - How to compare Hive and Cassandra data in Java when there are around 1 million records JPQL将列表添加到HashMap中,将列表记录相乘 - JPQL Adding Lists into HashMap, multiplying List records
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM