简体   繁体   中英

Regex optional matching for multiline patterns

I'm trying to figure out how to come up with one regex that supports the following 2 use cases:

Use Case 1:

-- File 1 (input) --
keepthis

junkhere:
this should be removed

Use Case 2:

-- File 2 (input) --
keepthis

------------
junkhere:
this should be removed

Essentially I'm building one regex to remove everything from "junkhere:" and down. However, in use case 2 there is an optional "------------" that gets included on the line before "junkhere:" sometimes but not always (not sure of the exact of -'s).

Output should be:

-- File 3 (output) --
keepthis

I have the following regex and it works for use case 1 but not for use case 2:

Pattern JUNKHERE_REGEX = Pattern.compile("^(((-+)(.*))?junkhere:(.*))$", Pattern.MULTILINE | Pattern.DOTALL);

    Matcher m = JUNKHERE_REGEX.matcher(<input from either file1 or file2>);
    if (m.find()) || (n.find() || (o.find()) { // there could be other matchers here n and o in this case so I would like to keep the replaceall code below the same so I don't have to create a new if statement 
      text = m.replaceAll("");  
      text = text.replaceAll("[\n]+$", ""); // replace and delete any newlines
    }
    System.out.println(text); // should echo "keepthis" 

I'm not that good with regex's but what do I need to make this work for use case 2 (and use case 1)?

Thanks!

Replace match of [\\n\\r]+(?:[-]+[\\n\\r]+)?\\s*junkhere:\\s*[\\n\\r][\\s\\S]* with empty string.

正则表达式可视化


Test it here: http://regexr.com?37edu and here: http://regexr.com?37ee1


In Java you have to double escape characters:

= text.replaceAll("[\\n\\r]+(?:[-]+[\\n\\r]+)?\\s*junkhere:\\s*[\\n\\r][\\s\\S]*", "");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM