简体   繁体   中英

Javascript regex splitting words in a comma separated string

I am trying to split a comma separated string using regex.

var a = 'hi,mr.007,bond,12:25PM'; //there are no white spaces between commas
var b = /(\S+?),(?=\S|$)/g;
b.exec(a); // does not catch the last item.

Any suggestion to catch all the items.

Use a negated character class:

/([^,]+)/g

will match groups of non-commas.

< a = 'hi,mr.007,bond,12:25PM'
> "hi,mr.007,bond,12:25PM"
< b=/([^,]+)/g
> /([^,]+)/g
< a.match(b)
> ["hi", "mr.007", "bond", "12:25PM"]

Why not just use .split ?

>'hi,mr.007,bond,12:25PM'.split(',')
["hi", "mr.007", "bond", "12:25PM"]

If you must use regex for some reason:

str.match(/(\S+?)(?:,|$)/g)
["hi,", "mr.007,", "bond,", "12:25PM"]

(note the inclusion of commas).

If you are passing a CSV file, some of your values may have got double-quotes around them, so you may need something a little more complicated. For example:

Pattern splitCommas = java.util.regex.Pattern.compile("(?:^|,)((?:[^\",]|\"[^\"]*\")*)");

Matcher m = splitCommas.matcher("11,=\"12,345\",ABC,,JKL");

while (m.find()) {
    System.out.println( m.group(1));
}

or in Groovy:

java.util.regex.Pattern.compile('(?:^|,)((?:[^",]|"[^"]*")*)')
        .matcher("11,=\"12,345\",ABC,,JKL")
            .iterator()
                .collect { it[1] }

This code handles:

  • blank lines (with no values or commas on them)
  • empty columns, including the last column being empty
  • handles values wrapped in double-quotes, including commas inside the double-quotes
  • but does not handle two double-quotes used for escaping a double quote-itself

The pattern consists of:

  • (?:^|,) matches the start of the line or a comma after the last column, but does not add that to the group

  • ((?:[^",]|"[^"]*")*) matches the value of the column, and consists of:

    • a collecting group, which collects zero or more characters that are:

      • [^",] is a character that's not a comma or a quote
      • "[^"]*" is a double-quote followed by zero or more other characters ending in another double-quote
    • those are or-ed together, using a non-collecting group: (?:[^",]|"[^"]*")

    • use a * to repeat the above any number of times: (?:[^",]|"[^"]*")*
    • and into a collecting group to give the columns value: ((?:[^",]|"[^"]*")*)

Doing escaping of double quotes is left as an exercise to the reader

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM