简体   繁体   中英

Split spaces avoiding double-quoted JS strings : from 'a “b \\” c“ d ' to ['a','”b \\“ c”','d']

I am currently building a small text editor for a custom file format. I have a GUI, but I also implemented a small output console. What I want to achieve is to add a very basic input field to execute some commands and pass parameters. A command would look like :

compile test.json output.bin -location "Paris, France" -author "Charles \\"Demurgos\\""

My problem is to get an array containing the space-separated arguments, but preserving the double quoted parts which might be a string generated by JSON.stringify containing escaped double-quotes inside.

To be clear, the expected array for the previous command is :

[
    'compile',
    'test.json',
    'output.bin',
    '-location',
    '"Paris, France"',
    '-author',
    '"Charles \\"Demurgos\\""'
]

Then I can iterate over this array and apply a JSON.parse if indexOf('"') == 0 to get the final result :

[
    'compile',
    'test.json',
    'output.bin',
    '-location',
    'Paris, France',
    '-author',
    'Charles "Demurgos"'
]

Thanks to this question : Split a string by commas but ignore commas within double-quotes using Javascript . I was able to get what I need if the arguments do NOT contain any double-quotes. Here is the regex i got :

/(".*?"|[^"\\s]+)(?=\\s*|\\s*$)/g

But it exits the current parameter when it encounters a double-quote, even if it is escaped. How can I adapt this RegEx to take care about the escaped or not double quotes ? And what about edge cases if I prompt action "windowsDirectory\\\\" otherArg , here the backslash is already escaped so even if it's followed by a double quote, it should exit the argument. This a problem I was trying to avoid as long as possible during previous projects, but I feel it's time for me to learn how to properly take under-account escape characters.

Here is a JS-Fiddle : http://jsfiddle.net/GwY8Y/1/ You can see that the beginning is well-parsed but the last arguments is split and bugs.

Thank you for any help.

This regex will give you the strings you need (see demo ):

"(?:\\"|\\\\|[^"])*"|\S+

Use it like this:

your_array = subject.match(/"(?:\\"|\\\\|[^"])*"|\S+/g);

Explain Regex

"                        # '"'
(?:                      # group, but do not capture (0 or more times
                         # (matching the most amount possible)):
  \\                     #   '\'
  "                      #   '"'
 |                       #  OR
  \\\\                   #   two backslashes
 |                       #  OR
  [^"]                   #   any character except: '"'
)*                       # end of grouping
"                        # '"'
|                        # OR
\S+                      # non-whitespace (all but \n, \r, \t, \f,
                         # and " ") (1 or more times (matching the
                         # most amount possible))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM