简体   繁体   English

解析整个csv文件与在java中逐行解析

[英]Parsing entire csv file vs parsing line by line in java

I have somewhat of a larger csv file approximately 80K to 120K rows (depending on the day). 我有一个更大的csv文件大约80K到120K行(取决于当天)。 I'm successfully running the code which parses the entire csv file into a java object using @CsvBindByName annotation. 我成功运行了使用@CsvBindByName注释将整个csv文件解析为java对象的@CsvBindByName Sample code: 示例代码:

Reader reader = Files.newBufferedReader(Paths.get(file));
    CsvToBean csvToBean = new CsvToBeanBuilder<Object>(reader)
            .withType(MyCustomClass.class)
            .withIgnoreLeadingWhiteSpace(true)
            .build(); 
    List<MyCustomClass> myCustomClass= csvToBean.parse();`

I want to change this code to parse the csv file line by line instead of entire file but retain the neatness of mapping to java bean object. 我想更改此代码以逐行解析csv文件而不是整个文件,但保留了映射到java bean对象的整洁性。 Essentially something like this: 基本上是这样的:

    CSVReader csvReader = new CSVReader(Files.newBufferedReader(Paths.get(csvFileLoc)));
    String[] headerRow = csvReader.readNext(); // save the headerRow
    String [] nextLine = null;
    MyCustomClass myCustomClass = new MyCustomClass(); 
    while ((nextLine = csvReader.readNext())!=null) {
                    myCustomClass.setField1(nextLine[0]);
                    myCustomClass.setField2(nextLine[1]);
                    //.... so on 
                }

But the above solution ties me to knowing the column positions for each field. 但是上面的解决方案让我知道每个字段的列位置。 What I would like is to map the string array I get from csv based on the header row similar to what opencsv does while parsing the entire csv file. 我想要的是根据标题行映射我从csv获得的字符串数组,类似于opencsv在解析整个csv文件时所做的事情。 However, I am not able to do that using opencsv, as far as I can tell. 但是,就我所知,我无法使用opencsv做到这一点。 I had assumed this would be a pretty common practice but I am unable to find any references to this online. 我曾认为这是一种非常常见的做法,但我无法在网上找到任何引用。 It could be that I am not understanding the CsvToBean usage correctly for opencsv library. 可能是因为我没有正确理解opencsv库的CsvToBean用法。 I could use csvToBean.iterator to iterate over the beans but I think entire csv file is loaded in memory with the build method, which kind of defeats the purpose of reading line by line. 我可以使用csvToBean.iterator迭代bean,但我认为整个csv文件使用build方法加载到内存中,这种方法无法csvToBean.iterator读取。 Any suggestions welcome 欢迎任何建议

Looking at the API docs further, I see that CsvToBean<T> implements Iterable<T> and has an iterator() method that returns an Iterator<T> that is documented as follows: 再看一下API文档 ,我看到CsvToBean<T>实现了Iterable<T>并且有一个iterator()方法,它返回一个Iterator<T> ,记录如下:

The iterator returned by this method takes one line of input at a time and returns one bean at a time. 此方法返回的迭代器一次接受一行输入并一次返回一个bean。

So it looks like you could just write your loop as: 所以看起来你可以把你的循环写成:

for (MyCustomClass myCustomClass : csvToBean) {
    // . . . do something with the bean . . .
}

Just to clear up some potential confusion, you can see in the source code that the build() method of CsvToBeanBuilder just creates the CsvToBean object, and doesn't do the actual input, and that the parse() method and the iterator of the CsvToBean object each do perform input. 只是为了清除一些潜在的混淆,你可以在源代码中看到CsvToBeanBuilderbuild()方法只是创建了CsvToBean对象,并没有做实际的输入,而是parse()方法和迭代器CsvToBean对象各自执行输入。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM