简体   繁体   中英

Replace a block of text using a pattern of Regex (in java)

I'm trying to remove a block of texts from a file using Regular Expressions. Now I have the content of the file in one String but the Matcher cannot find the pattern. The example file is:

\begin{comment}
this block should be removed
i.e. it need to be replaced
\end{comment}
this block should remains.
\begin{comment}
this should be removed too.
\end{comment}

I need to find the blocks starting with \\begin{comment} and ending with \\end{comment} , and then remove them. This is the minimal code that I used. The regex that I'm using is \\\\begin\\{.*?\\\\end\\{comment\\} which should find and pattern starting with '\\begin' until the first occurrence of '\\end{comment}'. I worked in Notepad++.

However using this java code, it find the first '\\begin' and last '\\end' lines and remove every thing in between. I want to keep the line which are not within the blocks.

import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class main {
    public static void main(String[] args) {
        String output;
        String s =  "\\begin{comment}\n"+
        "this block should be removed\n"+
        "i.e. it need to be replaced\n"+
        "\\end{comment}\n"+
        "this block should remains.\n"+
        "\\begin{comment}\n"+
        "this should be removed too.\n"+
        "\\end{comment}";
        Matcher m = Pattern.compile("\\\\begin\\{comment(?s).*\\\\end\\{comm.*?\\}").matcher(s);
        while(m.find())
        {
            System.out.println(m.group(0));
            output = m.replaceAll("");
        }

        m = Pattern.compile("\\begin").matcher(s);
        while(m.find())
        {
            System.out.println(m.group(0));
            output = m.replaceAll("");
        }
    }
}

Update:

I used this online tool to find it. Matcher m = Pattern.compile("\\\\begin\\{comment(?s). \\\\end\\{comm. ?\\}").matcher(s);

You have to fix your code in 2 points:

  1. The pattern should be consistent with your Notepad++ equivalent, the star should be followed by ? to be lazy :

     Matcher m = Pattern.compile("\\\\\\\\begin\\\\{comment}(?s).*?\\\\\\\\end\\\\{comment}").matcher(s); -------------------------------------------------------^ 

Note that this pattern works correctly only if no nested comment section exists.

  1. The latter fix regards the logic: if you call the matcher replaceAll function it replaces every matching section when executed (already at the first m.find() loop execution). If you need the loop to inspect every comment block replace it with:

     output = m.replaceFirst(""); 

    or simply apply output = m.replaceAll(""); without any loop at all.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM