I am trying to parse a comma separated string using:
val array = input.split(",")
Then I notice that some input lines have "," inside a quotation mark:
data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5
*Note that the data is not very clean, so some fields are inside quotation marks while some don't
How do I split such line into:
array(0) = data0
array(1) = data1
array(2) = data2
array(3) = data3
array(4) = data4-1, data4-2, data4-3
array(5) = data5
As per my comments:
Parsing CSV files can be notoriously tricky due to its behaviour around quotes, and commas and quotes included in quoted values. I recommend pulling in a library which is well regarded for dealing robustly with all the edge cases.
Options you could consider include scala-csv , and traversable-csv . Or use a Java library like opencsv .
Otherwise, if you don't want to or can't use a library, you could look at this SO answer or this SO answer to see how others have tackled roll-your-own CSV parsers.
I would recommend using a CSV library to parse CSV data - the format is a mess and painful to get right.
I would suggest kantan.csv , mainly because I'm the author but also because it lets you got a bit further than turning a CSV stream into a list of arrays of strings. Take, for example, the following input:
1,Foo,2.0
2,Bar,false
Using kantan.csv, you can write:
import kantan.csv.ops._
new File("path/to/csv").asUnsafeCsvRows[(Int, String, Either[Float, Boolean])](',', false)
Calling toList
on the result will yield:
List((1,Foo,Left(2.0)), (2,Bar,Right(false)))
Note how the last column is either a float or a boolean, but this is captured in the type of each element of the iterator.
Below is my solution to parse CSV row:
String[] res = row.split(";");
for (int i = 0; i < res.length; i++) {
res[i] = deQuotes(res[i]);
}
return res;
remove quotes with REGEXP:
static final Pattern PATTERN_DE_QUOTES = Pattern.compile("(?i)^\\\"(.*)\\\"$");
static String deQuotes(String s) {
Matcher matcher;
if ((matcher = PATTERN_DE_QUOTES.matcher(s)).find()) {
return matcher.group(1).replaceAll("\"\"", "\"");
}
return s;
}
I hope it will help you.
You can actually split that line with a regex expression.
val s = """data0, "data1", data2, data3, "data4-1, data4-2, data4-3", data5"""
"""((".*?")|('.*?')|[^"',]+)+""".r.findAllIn(s).foreach(println)
btw. any library that can parse csv files can also parse a single csv line. Just wrap the string into a StringReader.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.