简体   繁体   中英

Parsing comma-separated values enclosed with quotes

I'm trying to parse comma separated values that are enclosed in quotes using only standard Java libraries (I know this must be possible)

As an example file.txt contains a new line for each row of

"Foo","Bar","04042013","04102013","Stuff"
"Foo2","Bar2","04042013","04102013","Stuff2"

However when I parse the file with the code I've written so far:

import java.io.*;
import java.util.Arrays;
 public class ReadCSV{

    public static void main(String[] arg) throws Exception {

        BufferedReader myFile = new BufferedReader(new FileReader("file.txt"));

        String myRow = myFile.readLine(); 
        while (myRow != null){
            //split by comma separated quote enclosed values
            //BUG - first and last values get an extra quote
            String[] myArray = myRow.split("\",\""); //the problem

            for (String item:myArray) { System.out.print(item + "\t"); }
            System.out.println();
            myRow = myFile.readLine();
        }
        myFile.close();
    }
}

However the output is

"Foo    Bar     04042013        04102013        Stuff"

"Foo2   Bar2    04042013        04102013        Stuff2"

Instead of

Foo    Bar     04042013        04102013        Stuff

Foo2   Bar2    04042013        04102013        Stuff2

I know I went wrong on the Split but I'm not sure how to fix it.

Before doing split, just remove first double quote and last double quote in myRow variable using below line.

myRow = myRow.substring(1, myRow.length() - 1);

(UPDATE) Also check if myRow is not empty. Otherwise above code will cause exception. For example below code checks if myRow is not empty and then only removes double quotes from the string.

if (!myRow.isEmpty()) {
    myRow = myRow.substring(1, myRow.length() - 1);
}

i think you will probably have to go for a stateful approach, basically like the code below (another state would be necessary if you want to allow escaping of quotes within a value):

import java.util.ArrayList;
import java.util.List;


public class CSV {

    public static void main(String[] args) {
        String s = "\"hello, i am\",\"a string\"";
        String x = s;
        List<String> l = new ArrayList<String>();
        int state = 0;
        while(x.length()>0) {
            if(state == 0) {
                if(x.indexOf("\"")>-1) {
                    x = x.substring(x.indexOf("\"")+1).trim();
                    state = 1;
                } else {
                    break;
                }
            } else if(state == 1) {
                if(x.indexOf("\"")>-1) {
                    String found = x.substring(0,x.indexOf("\"")); 
                    System.err.println("found: "+found);
                    l.add(found);
                    x = x.substring(x.indexOf("\"")+1).trim();
                    state = 0;
                } else {
                    throw new RuntimeException("bad format");
                }
            } else if(state == 2) {
                if(x.indexOf(",")>-1) {
                    x = x.substring(x.indexOf(",")+1).trim();
                    state = 0;
                } else {
                    break;
                }
            }
        }
        for(String f : l) {
            System.err.println(f);
        }
    }


}

Instead, you can use replaceAll , which, for me, looks more suitable for this task:

myRow = myRow.replaceAll("\"", "").replaceAll(","," ");

This will replace all the " with nothing (Will remove them), then it'll replace all , with space (You can increase the number of spaces of course).

The problem in above code snippet is that you are splitting the String based on "," . on your Line start "foo"," and end ","stuff" the starting and ending quotes does not match with "," so there are not splitted.

so this definitely not a bug in java. in your case you need to handle that part yourself.

You have multiple options to do it. some of them can be like below. 1. If you are sure there will be always a starting " and ending " you can remove them from String before hand before splitting. 2. If the starting " and " are optional, you can first check it with startsWith endsWith and then remove if exists before splitting.

You can simply get the String delimitered by the comma and then delete the first and last '"'. =) hope thats helpfull dont have much time :D

String s = "\"Foo\",\"Bar\",\"04042013\",\"04102013\",\"Stuff\"";
        String[] bufferArray = new String[10];
        String bufferString;
        int i = 0;
        System.out.println(s);

        Scanner scanner = new Scanner(s);
        scanner.useDelimiter(",");

        while(scanner.hasNext()) {
            bufferString = scanner.next();
            bufferArray[i] = bufferString.subSequence(1, bufferString.length() - 1).toString();
            i++;
        }

        System.out.println(bufferArray[0]);
        System.out.println(bufferArray[1]);
        System.out.println(bufferArray[2]);

This solution is less elegant than a String.split() oneliner. The advantage is that we avoid fragile string manipulation, ie. the use of String.substring() . The string must end with ," however.

This version handles spaces between delimiters. Delimiter characters within quotes are ignored as expected, as are escaped quotes (for example \\" ).

String s = "\"F\\\",\\\"oo\"  ,    \"B,ar\",\"04042013\",\"04102013\",\"St,u\\\"ff\"";
Pattern p = Pattern.compile("(.*?)\"\\s*,\\s*\"");
Matcher m = p.matcher(s + ",\""); // String must end with ,"
while (m.find()) {
    String result = m.group(1);
    System.out.println(result);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM