简体   繁体   中英

Parse string into command and args in JavaScript

I need to parse strings intended for cross-spawn

From the following strings:

cmd foo bar
cmd "foo bar" --baz boom
cmd "baz \"boo\" bam"
cmd "foo 'bar bud' jim" jam
FOO=bar cmd baz

To an object:

{command: 'cmd', args: ['foo', 'bar']}
{command: 'cmd', args: ['foo bar', '--baz', 'boom']}
{command: 'cmd', args: ['baz "boo" bam']}
{command: 'cmd', args: ['foo \'bar bud\' jim', 'jam']}
{command: 'cmd', args: ['baz'], env: {FOO: 'bar'}}

I'm thinking a regex would be possible, but I'd love to avoid writing something custom. Anyone know of anything existing that could do this?

Edit

The question and answers are still valuable, but for my specific use-case I no longer need to do this. I'll use spawn-command instead (more accurately, I'll use spawn-command-with-kill ) which doesn't require the command and args to be separate. This will make life much easier for me. Thanks!

You could roll your own with regex, but I'd strongly recommend looking at either:

  • minimist by Substack, or
  • yargs which is a more comprehensive implementation of argument parsing for node

Both are battle-hardened and well supported; minimist gets about 30 million downloads a month while yargs gets nearly half that.

It's very likely you can find a way to use one or the other to get the CLI syntax you want, with the exception of env support which IMO should be handled separately (I can't imagine why you'd want to be opinionated about environment variables being set as part of the command)

A regular expression could match your command line...

^\s*(?:((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|(?:\\.)|\S)+)\s*)$

... but you wouldn't be able to extract individual words. Instead, you need to match the next word and accumulate it into a command line.

function parse_cmdline(cmdline) {
    var re_next_arg = /^\s*((?:(?:"(?:\\.|[^"])*")|(?:'[^']*')|\\.|\S)+)\s*(.*)$/;
    var next_arg = ['', '', cmdline];
    var args = [];
    while (next_arg = re_next_arg.exec(next_arg[2])) {
        var quoted_arg = next_arg[1];
        var unquoted_arg = "";
        while (quoted_arg.length > 0) {
            if (/^"/.test(quoted_arg)) {
                var quoted_part = /^"((?:\\.|[^"])*)"(.*)$/.exec(quoted_arg);
                unquoted_arg += quoted_part[1].replace(/\\(.)/g, "$1");
                quoted_arg = quoted_part[2];
            } else if (/^'/.test(quoted_arg)) {
                var quoted_part = /^'([^']*)'(.*)$/.exec(quoted_arg);
                unquoted_arg += quoted_part[1];
                quoted_arg = quoted_part[2];
            } else if (/^\\/.test(quoted_arg)) {
                unquoted_arg += quoted_arg[1];
                quoted_arg = quoted_arg.substring(2);
            } else {
                unquoted_arg += quoted_arg[0];
                quoted_arg = quoted_arg.substring(1);
            }
        }
        args[args.length] = unquoted_arg;
    }
    return args;
}

While you could use raw regular expressions, but what you're building is called a tokenizer. The reason you'd want a tokenizer is to handle certain contexts such as strings that contain spaces, which you don't want to split on.

There are existing generic libraries out there specifically designed for doing parsing and tokenization and can handle cases like strings, blocks, etc.

https://www.npmjs.com/package/js-parse

Additionally, most of these command line formats and config file formats already have parsers/tokenizers. You might want to leverage those and then normalize the results from each into your object structure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM