简体   繁体   中英

Parsing part of a CSV file in Java

I need to deal with a CSV file that actually contains several tables, like this:

"-------------------- Section 1 --------------------"

"Identity:","ABC123"
"Initials:","XY"
"Full Name:","Roger"
"Street Address:","Foo St"


"-------------------- Section 2 --------------------"

"Line","Date","Time","Status",

"1","30/01/2013","10:49:00 PM","ON",
"2","31/01/2013","8:04:00 AM","OFF",
"3","31/01/2013","11:54:00 PM","OFF",


"-------------------- Section 3 --------------------"

I'd like to parse the blocks in each section with something like commons-csv , but it would be helpful to handle each section individually, stopping at the double-newline as if it was the end of file. Has anyone tackled this problem already?

NOTE: Files can be arbitrarily long, and can contain any number of sections, so I'm after a single pass if possible. Each section appears to start with a titled heading ( ------- title ------\\n\\n ) and end with two empty lines.

How about use java.io.FilterReader? You can figure out what Reader methods you need to override by trial and error. You custom class will have to read ahead an entire line and see if it is a 'Section' line. If it is, then return EOF to stop the commons-csv parser. You can then read the next section from your custom class. Not elegant, but it would probably work. Example given:

class MyReader extends FilterReader {
    private String line;
    private int pos;
    public MyReader(BufferedReader in) { 
        super(in);
        line = null;
        pos = 0;
    }
    @Override
    public int read() {
        try {
            if ( line == null || pos >= line.length() ) {
                do {
                    line = ((BufferedReader)in).readLine();
                } while ( line != null && line.length() == 0 );
                if ( line == null ) return -1;
                line = line + "\r\n";
                pos = 0;
            }
            if ( line.contains("-------------------- Section ") ) {
                line = null;
                return -1;
            }
            return line.charAt(pos++);
        } catch ( Exception e) { throw new RuntimeException(e); }
    }
}

You would use it like so:

public void run() throws Exception {
    BufferedReader in = new BufferedReader(new FileReader(ReadRecords.class.getResource("/records.txt").getFile()));
    MyReader reader = new MyReader(in);
    int c;
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    while( (c=reader.read()) != -1 ) { 
        System.out.print((char)c);
    }
    reader.close();
}

You can use String.split() to access the individual CSV sections:

for (String csv : content.split("\"----+ Section \\d+ ----+\"")) {

    // Skip empty sections
    if (csv.length() == 0) continue;

    // parse and process each individual "csv" section here
}

Assuming that the file contains text in 2 sections, delineated as per the example, its processing is straightforward, eg:

  1. Create a Java BufferedReader object to read the file line-by-line
  2. Read Section 1 and extract the key-value pairs
  3. Read and ignore the remaining lines, until the CSV header (Section 2)
  4. Initialize a CSV parser ( commons-csv or other) using the header and the other parameters (comma separator, quotes etc.)
  5. Process every subsequent line with the parser

The parser will provide some iterator-like API to read each line into a Java object, from which reading the fields will be trivial. This approach is vastly superior to pre-loading everything in memory, because it can accommodate any file size.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM