简体   繁体   English

如何在Javascript中使用正则表达式检查多个匹配的单词

[英]How to check multiple matching words with regex in Javascript

Hey I have code like this 嘿,我有这样的代码

var text = "We are downing to earth"
var regexes = "earth|art|ear"
if (regexes.length) {
    var reg = new RegExp(regexes, "ig");
    console.log(reg)
    while ((regsult = reg.exec(text)) !== null) {
      var word = regsult[0];
      console.log(word)
    }
  }

I want to get matching words from text. 我希望从文本中获得匹配的单词。 It should have "earth", "art" and "ear" as well. 它应该有“地球”,“艺术”和“耳朵”。 Because "earth" consist of those substring. 因为“地球”由那些子串组成。 Instead, it only produce "earth". 相反,它只产生“地球”。

Is there any mistake with my regex pattern? 我的正则表达式模式有什么错误吗? Or should I use another approach in JS? 或者我应该在JS中使用另一种方法?

Thanks 谢谢

As discussed in another answer, a single regexp cannot match multiple overlapping alternatives. 正如另一个答案中所讨论的,单个正则表达式无法匹配多个重叠的替代方案。 In your case, simply do a separate regexp test for each word you are looking for: 在您的情况下,只需对您要查找的每个单词执行单独的正则表达式测试:

var text = "We are downing to earth"
var regexes = ["earth", "art", "ear"];

var results = [];
for (var i = 0; i < regexes.length; i++ ) {
  var word = regexes[i];
  if (text.match(word) results.push(word);
}

You could tighten this up a little bit by doing 你可以做一点点收紧

regexes . filter(function(word) { return (text.match(word) || [])[0]; });

If your "regexes" are actually just strings, you could just use indexOf and keep things simpler: 如果你的“正则表达式”实际上只是字符串,你可以使用indexOf并保持简单:

regexes . filter(function(word) { return text.indexOf(word) !== -1; });

You only get earth as a match because the regex engine has matched earth as the first alternative and then moved on in the source string, oblivious to the fact that you could also have matched ear or art . 你只能将earth作为匹配,因为正则表达式引擎已将earth作为第一选择匹配,然后在源字符串中继续前进,不知道你也可以匹配earart This is expected behavior with all regex engines - they don't try to return all possible matches, just the first one, and matches generally can't overlap. 这是所有正则表达式引擎的预期行为 - 它们不会尝试返回所有可能的匹配,只是第一个,匹配通常不能重叠。

Whether earth or ear is returned depends on the regex engine. 是否返回earthear取决于正则表达式引擎。 A POSIX ERE engine will always return the leftmost, longest match, whereas most current regex engines (including JavaScript's) will return the first possible match, depending on the order of alternation in the regex. POSIX ERE引擎将始终返回最左侧,最长的匹配,而大多数当前正则表达式引擎(包括JavaScript)将返回第一个可能的匹配,具体取决于正则表达式中的交替顺序。

So art|earth|ear would return earth , whereas ear|art|earth would return ear . 因此, art|earth|ear将回归earth ,而ear|art|earth会回归ear

You can make the regex find overlapping matches (as long as they start in different positions in the string) by using positive lookahead assertions : 可以通过使用正向前瞻断言使正则表达式找到重叠匹配(只要它们在字符串中的不同位置开始):

(?=(ear|earth|art))

will find ear and art , but not earth because it starts at the same position as ear . 会发现earart ,但不会发现earth因为它始于与ear相同的位置。 Note that you must not look for the regex' entire match ( regsult[0] in your code) in this case but for the content of the capturing group , in this case ( regsult[1] ). 请注意,在这种情况下,您不能查找正则表达式的整个匹配(代码中的regsult[0] ),但是对于捕获组的内容,在这种情况下( regsult[1] )。

The only way around this that I can think of at the moment would be to use 我现在想到的唯一方法是使用

(?=(ear(th)?|art))

which would have a result like [["", "ear", "th"], ["", "art", undefined]] . 会产生像[["", "ear", "th"], ["", "art", undefined]]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM