简体   繁体   English

如何使正则表达式只匹配每个匹配的第一次出现?

[英]How to make regex match only first occurrence of each match?

/\b(keyword|whatever)\b/gi

How can I modify the above javascript regex to match only the first occurance of each word (I believe this is called non-greedy)? 如何修改上面的javascript正则表达式以匹配每个单词的第一次出现(我相信这被称为非贪婪)?

First occurance of "keyword" and first occurance of "whatever" and I may put more more words in there. 第一次出现“关键词”并首次出现“无论什么”,我可能会在那里放更多的词。

从正则表达式中删除g标志:

/\b(keyword|whatever)\b/i

What you're doing is simply unachievable with a singular regular expression. 你正在做的事情是单一的正则表达无法实现的。 Instead you will have to store every word you wish to find in an array, loop through them all searching for an answer, and then for any matches, store the result in an array. 相反,您必须将您希望在数组中找到的每个单词存储起来,遍历所有单词以搜索答案,然后对于任何匹配,将结果存储在数组中。

Example: 例:

var words = ["keyword","whatever"];
var text = "Whatever, keywords are like so, whatever... Unrelated, I now know " +
           "what it's like to be a tweenage girl. Go Edward.";
var matches = []; // An empty array to store results in.
/* When you search the text you need to convert it to lower case to make it
   searchable.
 * We'll be using the built in method 'String.indexOf(needle)' to match 
   the strings as it avoids the need to escape the input for regular expression
   metacharacters. */

//Text converted to lower case to allow case insensitive searchable.
var lowerCaseText = text.toLowerCase();
for (var i=0;i<words.length;i++) { //Loop through the `words` array
    //indexOf returns -1 if no match is found
    if (lowerCaseText.indexOf(words[i]) != -1) 
        matches.push(words[i]);    //Add to the `matches` array
}

Remove the g modifier from your regex. 从正则表达式中删除g修饰符。 Then it will find only one match. 然后它只会找到一个匹配。

What you're talking about can't be done with a JavaScript regex. 使用JavaScript正则表达式无法完成您所说的内容。 It might be possible with advanced regex features like .NET's unrestricted lookbehind, but JavaScript's feature set is extremely limited. 有可能使用高级正则表达式功能,如.NET的无限制外观,但JavaScript的功能集非常有限。 And even in .NET, it would probably be simplest to create a separate regex for each word and apply them one by one; 甚至在.NET中,为每个单词创建一个单独的正则表达式并逐个应用它们可能是最简单的; in JavaScript it's your only option. 在JavaScript中,它是您唯一的选择。

Greediness only applies to regexes that employ quantifiers, like /START.*END/ . 贪婪仅适用于使用量词的正则表达式,例如/START.*END/ The . . means "any character" and the * means "zero or more". 表示“任何字符”, *表示“零或更多”。 After the START is located, the .* greedily consumes the rest of the text. 找到START后, .*贪婪地消耗其余文本。 Then it starts backtracking, "giving back" one character at a time until the next part of the regex, END succeeds in matching. 然后它开始回溯,一次“回馈”一个字符,直到正则表达式的下一部分, END成功匹配。
We call this regex "greedy" because it matches everything from the first occurrence of START to the last occurrence of END . 我们将此正则表达式称为“贪婪”,因为它匹配从第一次出现START到最后出现END

If there may be more than one "START"-to-"END" sequence, and you want to match just the first one, you can append a ? 如果可能有多个“START” - “ - END”序列,并且您只想匹配第一个序列,则可以附加一个? to the * to make it non-greedy: /START.*?END/ . *使其变得非贪婪:/ /START.*?END/ . * ? /START.*?END/ Now, each time the . 现在,每一次. tries to consume the next character, it first checks to see if it could match END at that spot instead. 尝试使用下一个字符,它首先检查它是否可以匹配该位置的END Thus it matches from the first START to the first END after that. 因此,在此之后它从第一个START到第一个END匹配。 And if you want to match all the "START"-to-"END" sequences individually, you add the 'g' modifier: /START.*?END/g . 如果你想单独匹配所有“START”到“END”序列,你可以添加'g'修饰符:/ /START.*?END/g .*?END/g。

It's a bit more complicated than that, of course. 当然,它比这复杂一点。 For example, what if these sequences can be nested, as in START…START…END…END ? 例如,如果这些序列可以嵌套,如START…START…END…END If I've gotten a little carried away with this answer, it's because understanding greediness is the first important step to mastering regexes. 如果我对这个答案感到有点失望,那是因为理解贪婪是掌握正则表达式的第一个重要步骤。 :-/ : - /

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM