负前瞻正则表达式在Java中不起作用

Question

The following regex successfully works when testing here , but when I try to implement it into my Java code, it won't return a match. 以下正则表达式在此处进行测试时可以成功运行，但是当我尝试将其实现到Java代码中时，它将不会返回匹配项。 It uses a negative lookahead to ensure no newlines occur between MAIN LEVEL and Bedrooms . 它使用否定的前瞻性来确保MAIN LEVEL和Bedrooms之间不会出现换行符。 Why won't it work in Java? 为什么在Java中不起作用？

regex 正则表达式

^\\s*\\bMAIN LEVEL\\b\\n(?:(?!\\n\\n)[\\s\\S])*\\bBedrooms:\\s*(.*)

Java 爪哇

pattern = Pattern.compile("^\\s*\\bMAIN LEVEL\\b\\n(?:(?!\\n\\n)[\\s\\S])*\\bBedrooms:\\s*(.*)");
    match = pattern.matcher(content);      
    if(match.find())
    {
        //Doesn't reach here
        String bed = match.group(1);
        bed = bed.trim();
    }

content is just a string read from a text file, which contains the exact text shown in the demo linked above. content只是从文本文件读取的字符串，其中包含上面链接的演示中显示的确切文本。

File file = new File("C:\\Users\\ME\\Desktop\\content.txt"); 
 content = new Scanner(file).useDelimiter("\\Z").next();

UPDATE: 更新：

I changed my code to include a multiline modifier (?m) , but it prints out "null". 我更改了代码以包含多行修饰符(?m) ，但它打印出“ null”。

pattern = Pattern.compile("(?m)^\\s*\\bMAIN LEVEL\\b\\n(?:(?!\\n\\n)[\\s\\S])*\\bBedrooms:\\s*(.*)");
    match = pattern.matcher(content);
    if(match.find())
    {   // Still not reaching here
        mainBeds=match.group(1);
        mainBeds= mainBeds.trim();
    }
  System.out.println(mainBeds);     // Prints null

Answer 1

The problem: 问题：

As explained in Alan Moore's answer , it's a mismatch between the format of the Line-Separators used in your file ( \\r\\n ), and what your pattern is specifying ( \\n ): 正如艾伦·摩尔（Alan Moore）的回答所述，文件中使用的Line-Separators格式（ \\r\\n ）与模式指定的内容（ \\n ）不匹配：

Original code: 原始代码：
Pattern.compile("^\\\\s*\\\\bMAIN LEVEL\\\\b \\\\n (?:(?! \\\\n\\\\n )[\\\\s\\\\S])*\\\\bBedrooms:\\\\s*(.*)"); Pattern.compile("^\\\\s*\\\\bMAIN LEVEL\\\\b \\\\n (?:(?! \\\\n\\\\n )[\\\\s\\\\S])*\\\\bBedrooms:\\\\s*(.*)");

Note: I explain what the \\r and \\n represent, and the context and difference between \\r\\n and \\n , in the second item of the "side notes" section. 注意：我将在“附带说明”部分的第二项中解释\\r和\\n代表什么，以及\\r\\n和\\n之间的上下文和差异。

The solution(s): 解决方案：

Most/all Java versions: 大多数/所有Java版本：
You can use \\r?\\n to match both formats, and this is sufficient in most cases . 您可以使用\\r?\\n匹配两种格式，这在大多数情况下就足够了 。
Most/all Java versions: 大多数/所有Java版本：
You can use \ \ |[\ \\ \ \\ \ ] to match "Any Unicode linebreak sequence" . 您可以使用\ \ |[\ \\ \ \\ \ ]匹配“任何Unicode \ \ |[\ \\ \ \\ \ ] 序列” 。
Java 8 and later: Java 8及更高版本：
You can use the Linebreak Matcher ( \\R ) . 您可以使用换行匹配器（ \\R ）。 It is equivalent to the second method (above), and whenever possible (Java 8 or later), this is the recommended method . 它等效于上面的第二种方法，并且在可能的情况下（Java 8或更高版本）， 这是推荐的方法 。

Resulting code (3rd method): 结果代码（第三种方法）：
Pattern.compile("^\\\\s*\\\\bMAIN LEVEL\\\\b \\\\R (?:(?! \\\\R\\\\R )[\\\\s\\\\S])*\\\\bBedrooms:\\\\s*(.*)"); Pattern.compile("^\\\\s*\\\\bMAIN LEVEL\\\\b \\\\R (?:(?! \\\\R\\\\R )[\\\\s\\\\S])*\\\\bBedrooms:\\\\s*(.*)");

Side notes: 旁注：

You can replace \\\\R\\\\R with \\\\R{2} , which is more readable. 您可以将\\\\R\\\\R替换为\\\\R{2} ，这样更易读。
Different formats of line-breaks exist and are used in different systems because early OSs inherited the "line-break logic" from mechanical typing machines, like typewriters. 由于早期的OS从机械打字机（如打字机）继承了“换行逻辑”，因此存在不同格式的换行符并在不同系统中使用。
The \\r in code represents a Carriage-Return , aka CR . 代码中的\\r表示回车符 ，也称为CR 。 The idea behind this is to return the typing cursor to the start of the line. 其背后的想法是将键入光标返回到行的开头。
The \\n in code represents a Line-Feed , aka LF . 代码中的\\n表示Line-Feed ，也就是LF 。 The idea behind this is to move the typing cursor to the next line. 其背后的想法是将输入光标移动到下一行。
The most common line-break formats are CR-LF ( \\r\\n ), used primarily by Windows; 最常见的换行格式是CR-LF （ \\r\\n ），主要由Windows使用； and LF ( \\n ), used by most UNIX-like systems. 和LF （ \\n ），由大多数类似UNIX的系统使用。 This is the reason why " \\r?\\n will be sufficient in most cases" , and you can reliably use it for systems intended for household-grade users. 这就是为什么“ \\r?\\n在大多数情况下就足够了”的原因 ，并且您可以将其可靠地用于家庭级用户的系统。
However , some (rare) OSs, usually in industrial-grade stuff such as servers, may use CR , LF-CR , or something else entirely, which is why the second method has so many characters in it, so if you need the code to be compatible with every system, `you will need the second, or preferably, the third method. 但是，某些（罕见）的OS（通常在服务器等工业级设备中）可能会使用CR ， LF-CR或其他完全使用的东西，这就是为什么第二种方法中包含这么多字符的原因，因此如果需要代码为了与每个系统兼容，`您将需要第二种方法，或者最好是第三种方法。

Here is a useful method for testing where your patterns are failing: 这是测试模式失败的有用方法：

 String content = "..."; //Replace "..." with your content. String patternString = "..."; //Replace "..." with your pattern. String lastPatternSuccess = "None. You suck at Regex!"; for (int i = 0; i <= patternString.length(); i++) { try { String patternSubstring = patternString.substring(0, i); Pattern pattern = Pattern.compile(patternSubstring); Matcher matcher = pattern.matcher(content); if (matcher.find()) { lastPatternSuccess = i + " - Pattern: " + patternSubstring + " - Match: \\n" + matcher.group(); } } catch (Exception ex) { //Ignore and jump to next } } System.out.println(lastPatternSuccess);

Answer 2

It's the line separators. 是行分隔符。 You're looking for \\n , but your file actually uses \\r\\n . 您正在寻找\\n ，但是您的文件实际上使用\\r\\n 。 If you're running Java 8, you can change every \\\\n in your code to \\\\R (the universal line separator). 如果您运行的是Java 8，则可以将代码中的每个\\\\n更改为\\\\R （通用行分隔符）。 For Java 7 or earlier, use \\\\r?\\\\n . 对于Java 7或更早版本，请使用\\\\r?\\\\n 。

负前瞻正则表达式在Java中不起作用

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-12-27 05:45:24

The problem: 问题：

The solution(s): 解决方案：

Side notes: 旁注：

解决方案2
2 2015-12-27 05:26:23

负前瞻正则表达式在Java中不起作用

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-12-27 05:45:24

The problem: 问题：

The solution(s): 解决方案：

Side notes: 旁注：

解决方案2 2 2015-12-27 05:26:23

解决方案1
4 已采纳 2015-12-27 05:45:24

解决方案2
2 2015-12-27 05:26:23