Trying to use RegEx
to split the following string:
"C=US,ST=NY,O=GOOGLE\, INC"
The intention is for O=GOOGLE\\, INC
to stay intact after splitting on comma.
If you can do without split, you can just use a regex like this that captures field data.
edit - Modified to match spurious escapes as well.
# /(?:^|,)((?:[^,\\]*(?:\\,|\\)?)+)(?:(?=,)|$)/
(?: ^ | , ) # Leading comma or BOL
( # (1 start), Field data
(?:
[^,\\]*
(?: \\, | \\ )?
)+
) # (1 end)
(?: # Lookahead, comma or EOL
(?= , )
| $
)
Output >>
** Grp 0 - ( pos 0 , len 4 )
C=US
** Grp 1 - ( pos 0 , len 4 )
C=US
--------------
** Grp 0 - ( pos 4 , len 6 )
,ST=NY
** Grp 1 - ( pos 5 , len 5 )
ST=NY
--------------
** Grp 0 - ( pos 10 , len 15 )
,O=GOOGLE\, INC
** Grp 1 - ( pos 11 , len 14 )
O=GOOGLE\, INC
Your data looks like it will be fairly reliably of the form:
foo=bar,spic=span,a=bob\\,fred
ie, pairs of key=val data, with escaped commas in the data. So if the escaped commas are only in the data, then you can use a simple lookahead for the 'key=' as part of your regexp. Assuming the key is always in capitals, then this works:
s = "C=US,ST=NY,O=GOOGLE\, INC"
s.split(/,(?=[A-Z]*=)/)
ie, split on a comma if it is followed by some capitals and an equals.
This will give you
["C=US", "ST=NY", "O=GOOGLE, INC"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.