简体   繁体   English

匹配单引号、双引号或根本没有引号之间的文本

[英]Match text between single quotes, double quotes, or no quotes at all

I'm trying to parse CLI-like arguments that could be enclosed in single quotes, double quotes, or no quotes at all.我正在尝试解析类似 CLI 的 arguments 可以用单引号、双引号或根本没有引号括起来。
Here's an example of what I'm trying to get:这是我想要得到的一个例子:

// --message "This is a 'quoted' message" --other 'This uses the "other" quotes'
const str = "--message \"This is a 'quoted' message\" --other 'This uses the \"other\" quotes'"

matchGitArgs(str) // ['--message', 'This is a \'quoted\' message', '--other', 'This uses the "other" quotes']

I've found a lot of similar questions, so this is what makes it different from them:我发现了很多类似的问题,所以这就是它与它们不同的原因:

  • It's important that it matches the arguments not in quotes too, and keeps the original order重要的是它匹配 arguments 而不是引号,并保持原始顺序
  • It should be able to parse single and double quote arguments in the same string它应该能够解析同一字符串中的单引号和双引号 arguments
  • It should not match the quotes themselves:它不应该与引号本身匹配:
matchGitArgs('This is "quoted"')
// Correct: ['This', 'is', 'quoted']
// Wrong: ['This', 'is', '"quoted"']
  • It should allow escape quotes and other quotes inside it:它应该允许其中包含转义引号和其他引号:
matchGitArgs('It is "ok" to use \'these\'')
// ["It", "is", "ok", "to", "use", "these"]

I've tried using a lot of different Regex patterns I've found here but they all didn't satisfy one of these conditions.我尝试使用在这里找到的许多不同的正则表达式模式,但它们都不满足其中一个条件。 I've also tried using libraries meant to parse CLI arguments, but it seems like they all rely on the process.argv (in Node.js), which is already split correctly based on the quotes, and so doesn't help me.我也尝试过使用旨在解析 CLI arguments 的库,但似乎它们都依赖于process.argv (在 Node.js 中),它已经根据引号正确拆分,所以对我没有帮助。
What I essentially need to do is generate an array like process.argv .我基本上需要做的是生成一个类似process.argv的数组。

It doesn't need to be a single regex, a js/ts function that does the same it's ok too.它不需要是一个单一的正则表达式,一个 js/ts function 也可以。

"Verbose" expressions and named groups work especially well for tokenizing problems: “详细”表达式和命名组对于标记问题特别有效:

 function* parseArgs(cmdLine) { const re = String.raw` ( -- (?<longOpt> \w+) (\s+ | =) ) | ( - (?<shortOpt> \w+) \s+ ) | ( (' (?<sq> (\\. | [^'])* ) ') \s+ ) | ( (" (?<dq> (\\. | [^"])* ) ") \s+ ) | ( (?<raw> [^\s"'-]+) \s+ ) | (?<error> \S) `.replace(/\s+/g, ''); for (let m of (cmdLine + ' ').matchAll(re)) { let g = Object.entries(m.groups).filter(p => p[1]); let [type, val] = g[0]; switch (type) { case 'error': throw new Error(m.index); case 'sq': case 'dq': yield ['value', val.replace(/\\/g, '')]; break; case 'raw': yield ['value', val]; break; case 'longOpt': case 'shortOpt': yield ['option', val]; } } } // args = String.raw` --message "This is \"a\" 'quoted' message" -s --longOption 'This uses the "other" quotes' --foo 1234 --file=message.txt --file2="Application Support/message.txt" ` for (let [type, s] of parseArgs(args)) console.log(type, ':', s)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM