简体   繁体   中英

Groovy - CSVParsing - How to split a string by comma outside double quotes without using any external libraries

I have a CSV file like below

COL1,COL2,COL3,COL4
3920,10163,"ST. PAUL, MN",TWIN CITIES

I want to read the file and split them outside double quotes WITHOUT using any external libraries. For example in the above CSV, we need to split them into 4 parts as
1. 3920
2. 10163
3. ST. PAUL, MN
4. TWIN CITIES

i tried using regex with folliwing code but never worked. I want to make this work using Groovy code. I tried different solutions given in Java. But couldnt achieve the solution.

NOTE : I dont want to use any external grails/Jars to make this work.

def staticCSV = new File(staticMapping.csv")  
staticCSV.eachLine {line->
def parts = line.split(",(?=(?:[^\"]\"[^\"]\")[^\"]\${1})")
parts.each {
    println "${it}"
}
}

Got the solution :

def getcsvListofListFromFile( String fileName ) {
    def lol = [] 
    def r1 = r1 = ",(?=(?:[^\"]*\"[^\"]*\")*[^\"]*\$)"  

    try {
        def csvf =  new File(fileName)  ;
        csvf.eachLine { line ->
            def c1 = line.split(r1)  
            def c2 = [] 
            c1.each { e1 ->
                def s = e1.toString() ;
                s = s.replaceAll('^"', "").replaceAll('"\$', "") 
                c2.add(s)
            }
            lol.add(c2) ;
        }
        return (lol)  
    } catch (Exception e) {
        def eMsg = "Error Reading file [" + fileName + "] --- " + e.getMessage();
        throw new RuntimeException(eMsg) 
    }
}

Using a ready-made library is a better idea. But you certainly have your reasons. Here is an alternative solution to yours. It splits the lines with commas and reassembles the parts that originally belonged together (see multipart).

def content =
"""COL1,COL2,COL3,COL4
   3920,10163, "ST. PAUL, MN" ,TWIN CITIES
   3920,10163, "   ST. PAUL, MN " ,TWIN CITIES, ,"Bla,Bla, Bla" """  

content.eachLine {line ->
    def multiPart
    for (part in line.split(/,/)) {
        if (!part.trim()) continue         // for empty parts 
        if (part =~ /^\s*\"/) {            // beginning of a multipart
            multiPart = part
            continue
        } else if (part =~ /"\s*$/) {      // end of the multipart
            multiPart += "," + part
            println multiPart.replaceAll(/"/, "").trim()
            multiPart = null
            continue
        }        
        if (multiPart) {
            multiPart += "," + part
        } else {
            println part.trim()
        }        
    }
}

Output (You can copy the code directly into the GroovyConsole to run.

COL1
COL2
COL3
COL4
3920
10163
ST. PAUL, MN
TWIN CITIES
3920
10163
ST. PAUL, MN
TWIN CITIES
Bla,Bla, Bla

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM