带有反向引用的Java正则表达式交集

Question

我试图在Java中创建一个正则表达式来匹配特定单词的模式，以找到具有相同模式的其他单词。 例如，单词“tooth”具有模式12213，因为't'和'o'都重复。 我希望正则表达式匹配其他单词，如“牙齿”。

所以这是我尝试使用反向引用。 在此特定示例中，如果第二个字母与第一个字母相同，则应该失败。 此外，最后一个字母应与其他所有字母不同。

String regex = "([a-z])([a-z&&[^\1]])\\2\\1([a-z&&[^\1\2]])";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("tooth");

//This works as expected
assertTrue(m.matches());

m.reset("tooto");
//This should return false, but instead returns true
assertFalse(m.matches());

我已经验证它适用于像“toot”这样的例子，如果我删除最后一组，即以下内容，所以我知道后面的引用正在起作用：

String regex = ([a-z])([a-z&&[^\1]])\\2\\1";

但是，如果我将最后一个组添加回模式的末尾，就好像它不再识别方括号内的反向引用。

我做错了什么，或者这是一个错误？

Answer 1

尝试这个：

(?i)\b(([a-z])(?!\2)([a-z])\3\2(?!\3)[a-z]+)\b

说明

(?i)           # Match the remainder of the regex with the options: case insensitive (i)
\b             # Assert position at a word boundary
(              # Match the regular expression below and capture its match into backreference number 1
   (              # Match the regular expression below and capture its match into backreference number 2
      [a-z]          # Match a single character in the range between “a” and “z”
   )
   (?!            # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
      \2             # Match the same text as most recently matched by capturing group number 2
   )
   (              # Match the regular expression below and capture its match into backreference number 3
      [a-z]          # Match a single character in the range between “a” and “z”
   )
   \3             # Match the same text as most recently matched by capturing group number 3
   \2             # Match the same text as most recently matched by capturing group number 2
   (?!            # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
      \3             # Match the same text as most recently matched by capturing group number 3
   )
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b             # Assert position at a word boundary

码

try {
    Pattern regex = Pattern.compile("(?i)\\b(([a-z])(?!\\2)([a-z])\\3\\2(?!\\3)[a-z]+)\\b");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        for (int i = 1; i <= regexMatcher.groupCount(); i++) {
            // matched text: regexMatcher.group(i)
            // match start: regexMatcher.start(i)
            // match end: regexMatcher.end(i)
        }
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

看到它在这里播放。 希望这可以帮助。

Answer 2

如果你打印你的正则表达式你会得到一个错误的线索，你的组中的反向引用实际上是由Java转义为产生一些奇怪的字符。 因此它无法按预期工作。 例如：

m.reset("oooto");
System.out.println(m.matches());

还打印

真正

此外， &&在正则表达式中不起作用，您将不得不使用前瞻。 此表达式适用于上面的示例：

String regex = "([a-z])(?!\\1)([a-z])\\2\\1(?!(\\1|\\2))[a-z]";

表达式(?!\\\\1)向前看，看到下一个charachter不是表达式中的第一个，而不向前移动正则表达式光标。

带有反向引用的Java正则表达式交集

问题描述

2 个解决方案

解决方案1
4 2012-07-19 05:13:46

解决方案2
4 已采纳 2012-07-19 05:15:22

带有反向引用的Java正则表达式交集

问题描述

2 个解决方案

解决方案1 4 2012-07-19 05:13:46

解决方案2 4 已采纳 2012-07-19 05:15:22

解决方案1
4 2012-07-19 05:13:46

解决方案2
4 已采纳 2012-07-19 05:15:22