簡體   English   中英

帶有反向引用的Java正則表達式交集

[英]Java regex intersection with backreference

我試圖在Java中創建一個正則表達式來匹配特定單詞的模式,以找到具有相同模式的其他單詞。 例如,單詞“tooth”具有模式12213,因為't'和'o'都重復。 我希望正則表達式匹配其他單詞,如“牙齒”。

所以這是我嘗試使用反向引用。 在此特定示例中,如果第二個字母與第一個字母相同,則應該失敗。 此外,最后一個字母應與其他所有字母不同。

String regex = "([a-z])([a-z&&[^\1]])\\2\\1([a-z&&[^\1\2]])";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("tooth");

//This works as expected
assertTrue(m.matches());

m.reset("tooto");
//This should return false, but instead returns true
assertFalse(m.matches());

我已經驗證它適用於像“toot”這樣的例子,如果我刪除最后一組,即以下內容,所以我知道后面的引用正在起作用:

String regex = ([a-z])([a-z&&[^\1]])\\2\\1";

但是,如果我將最后一個組添加回模式的末尾,就好像它不再識別方括號內的反向引用。

我做錯了什么,或者這是一個錯誤?

嘗試這個:

(?i)\b(([a-z])(?!\2)([a-z])\3\2(?!\3)[a-z]+)\b

說明

(?i)           # Match the remainder of the regex with the options: case insensitive (i)
\b             # Assert position at a word boundary
(              # Match the regular expression below and capture its match into backreference number 1
   (              # Match the regular expression below and capture its match into backreference number 2
      [a-z]          # Match a single character in the range between “a” and “z”
   )
   (?!            # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
      \2             # Match the same text as most recently matched by capturing group number 2
   )
   (              # Match the regular expression below and capture its match into backreference number 3
      [a-z]          # Match a single character in the range between “a” and “z”
   )
   \3             # Match the same text as most recently matched by capturing group number 3
   \2             # Match the same text as most recently matched by capturing group number 2
   (?!            # Assert that it is impossible to match the regex below starting at this position (negative lookahead)
      \3             # Match the same text as most recently matched by capturing group number 3
   )
   [a-z]          # Match a single character in the range between “a” and “z”
      +              # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
\b             # Assert position at a word boundary

try {
    Pattern regex = Pattern.compile("(?i)\\b(([a-z])(?!\\2)([a-z])\\3\\2(?!\\3)[a-z]+)\\b");
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        for (int i = 1; i <= regexMatcher.groupCount(); i++) {
            // matched text: regexMatcher.group(i)
            // match start: regexMatcher.start(i)
            // match end: regexMatcher.end(i)
        }
    } 
} catch (PatternSyntaxException ex) {
    // Syntax error in the regular expression
}

看到它在這里播放。 希望這可以幫助。

如果你打印你的正則表達式你會得到一個錯誤的線索,你的組中的反向引用實際上是由Java轉義為產生一些奇怪的字符。 因此它無法按預期工作。 例如:

m.reset("oooto");
System.out.println(m.matches());

還打印

真正

此外, &&在正則表達式中不起作用,您將不得不使用前瞻 此表達式適用於上面的示例:

String regex = "([a-z])(?!\\1)([a-z])\\2\\1(?!(\\1|\\2))[a-z]";

表達式(?!\\\\1)向前看,看到下一個charachter不是表達式中的第一個,而不向前移動正則表達式光標。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM