简体   繁体   中英

How to split csv file when there is line seperated value in few columns instead of comma separated

I am reading a comma seperated file line by line. But there are few columns that has line separated values instead of comma and I am getting IndexOutOfBoundsException error. Is there any way around to fix it?


 if (latestRoleFile != null) {
        String rePattern = "(\"[^\",]++),([^\"]++\")";
        Pattern pattern = Pattern.compile(rePattern);
        String fileLocation = directoryLocation + "\\" + latestRoleFile;
        File file = new File(fileLocation);
        InputStream inputStream = null;
        try {
            inputStream = new FileInputStream(file);
            BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
            String line = null;
            br.readLine();
            while ((line = br.readLine()) != null) {

                Matcher matcher = pattern.matcher(line);
                if (matcher.find()) {
                    String newString = line.replaceAll(rePattern, "$1|$2");
                    line = newString;
                    line = line.replace("\"", "");
                }

                String[] chunks = line.split(",");
                String subRoleId = chunks[0];
                String subSubscriberId = chunks[1];
                String name = chunks[2];
                HashMap innerMap = new HashMap();
                innerMap.put("SubRoleId", subRoleId);
                innerMap.put("SubSubscriberId", subSubscriberId);
                innerMap.put("Name", name);
                subRoleData.put(subRoleId, innerMap);
            }
        } catch (IOException e) {
            System.out.println(e.getLocalizedMessage());
        }
    }

sample file is:

    6,1,"Senior Claims Specialist 1","In active role ",False
    7,1,"Underwriter","Lisandra Noto, Melissa, Alanna, Jared, Chris, Dana, Bieloh,Ben, Samantha ",True
    8,1,"AVP Lead Underwriter","Bechel, William
    Hatutale, Anneline
    Johnson, Kirsten
    Markovich, Daniel
    Nace, Patti
    Sullivan, Zachary
    Toohey, Felicia
    Woodward, Mark",True
    9,1,"VP, Underwriting Operations ","Beckie Wendorf",True

I've had a lot of success using 3rd party libraries (such as opencsv ) to handle well formatted csv files. There are lots of gotchas that can occur when attempting to craft DIY csv parsers.

import com.opencsv.CsvReader;

...

CSVReader reader = new CSVReader(new FileInputStream(file));
String[] line;
while ((line = reader.readNext()) != null) {
    HashMap innerMap = new HashMap();
    innerMap.put("SubRoleId", line[0);
    innerMap.put("SubSubscriberId", line[1]);
    innerMap.put("Name", line[2]);
    subRoleData.put(line[0], innerMap);
}

Assuming for, whatever reason, you are not able to import a jar file and use the classes therein, then you'll have to employ a much more error prone technique. Assuming your input file does not have any escaped quotes, you can check the number of quotes in a line. If the number is not even, it means there will be trailing data, so you'll need to read in the next line.

Here is some code which could help. I haven't run tests, this code is just to give you an idea of what you can do.

public int countQuotes(String string) {
    int count = 0;
    for (int i = 0; i < string.length(); i++) {
        if (string.charAt(i) == '"')
            count++;
    }
    return count;
}

public String getNextLine(BufferedReader reader) {
    try {
        String multiLine = "";
        do {
            String line = reader.readLine();
            if (line == null)
                return null;
            multiLine += line;
        } while (countQuotes(multiLine) % 2 != 0);
        return multiLine;
    } catch (IOException e) {
        return null;
    }
}

You can now call getNextLine in a loop and at least know that every string returned will contain an even number of quotes. When getNextLine returns null, the file is done being processed. Please note that this solution will not return the last line if the csv file is poorly formatted (has an unterminated quote)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM