简体   繁体   English

javascript正则表达式中的可选部分(带有捕获组)

[英]Optional parts in javascript regular expression (with capture groups)

I have a question regarding how to implement optional parts to a regular expression. 我有一个关于如何对正则表达式实现可选部分的问题。 I have taken an example from parsing good-old text adventure input. 我以解析老式的文字冒险输入为例。 This highlights my task pretty well. 这很好地突出了我的任务。 Here is an example to show what I'm after: 这是显示我所追求的示例:

var exp = /^([a-z]+)(?:\s([a-z0-9\s]+)\s(on|with)\s([a-z\s]+))?$/i;

var strings = [
    "look",
    "take key",
    "take the key",
    "put key on table",
    "put the key on the table",
    "open the wooden door with the small rusty key"
];

for (var i=0; i < strings.length;i++) {
    var match = exp.exec(strings[i]);

    if (match) {
        var verb = match[1];
        var directObject = match[2];
        var preposition = match[3];
        var indirectObject = match[4];

        console.log("String: " + strings[i]);
        console.log("  Verb: " + verb);
        console.log("  Direct object: " + directObject);
        console.log("  Preposition: " + preposition);
        console.log("  Indirect object: " + indirectObject);    
    } else {
        console.log("String is not a match: " + strings[i]);
    }
    console.log(match);
}

My regular expression works for the first and the three last strings. 我的正则表达式适用于第一个和最后三个字符串。

I know how to get the correct result using other methods (like .split()). 我知道如何使用其他方法(例如.split())获得正确的结果。 This is an attempt to learn regular expressions so I'm not looking for an alternative way to do this :-) 这是一种尝试学习正则表达式的尝试,所以我没有在寻找替代方法来进行此操作:-)

I have tried adding more optional non-capture groups, but I couldn't get it to work: 我尝试添加更多可选的非捕获组,但无法正常工作:

var exp = /^([a-z]+)(?:\s([a-z0-9\s]+)(?:\s(on|with)\s([a-z\s]+))?)?$/i;

This works for the three first string, but not the three last. 这适用于前三个字符串,但不适用于最后三个字符串。

So what I want is: first word, some characters until a specified word (like "on"), some characters until end of string 所以我想要的是:第一个单词,直到指定单词的某些字符(例如“ on”),直到字符串结尾的一些字符

The tricky part is the different variants. 棘手的部分是不同的变体。

Can it be done? 能做到吗

WORKING SOLUTION: 解决方案:

exp = /^([a-z]+)(?:\s((?:(?!\s(?:on|with)).)*)(?:\s(on|with)\s(.*))?)?$/i;

Perhaps some regex like this : 也许一些正则表达式是这样的:

var exp = /^([a-z]+)(?:(?:(?!\s(?:on|with))(\s[a-z0-9]+))+(?:\s(?:on|with)(\s[a-z0-9]+)+)?)?$/i;

The group \\s[a-z0-9]+ captures a word preceded by a space. \\s[a-z0-9]+捕获一个带空格的单词。

(?!\\s(?:on|with)) avoids this word to be "on" or "with". (?!\\s(?:on|with))避免将此单词设为“ on”或“ with”。

Thus (?:(?!\\s(?:on|with))(\\s[a-z0-9]+))+ is the list of words before "on" or "with". 因此(?:(?!\\s(?:on|with))(\\s[a-z0-9]+))+是“ on”或“ with”之前的单词的列表。

You can test here . 你可以在这里测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM