简体   繁体   English

Java Regex:严格匹配内部块中两个字符串之间的文本

[英]Java Regex: Match text between two strings strictly inner block

I need a regex which matches between two strings but takes the inner block only. 我需要一个正则表达式,它可以在两个字符串之间进行匹配,但是仅使用内部块。 I tried using reluctant quantifier but it did not work. 我尝试使用勉强的量词,但没有用。

Here is an example: 这是一个例子:

<div>
    Hi
</div>
<div class = "quote">
    This is mail.
    <hr tabindex="-1">
    <div color="r">
        <b>From:</b>xyz<br>
        <b>Sent:</b>xyz PM<br>
        <b>To:</b>xyz<br><br>
    </div>
</div>

I used this regex but it did not work (with DOTALL matching, so that "." matches newline as well) 我使用了此正则表达式,但不起作用(与DOTALL匹配,因此“。”也与换行符匹配)

<div.*(From:.*Sent:.*To:.*)*?</div>

Above regex is matching everything since the input text starts with <div> and ends with </div> , but I need the just above and below the pattern specified inside the bracket. 由于输入文本以<div>开头和</div>结束,因此regex上面的内容可以匹配所有内容,但是我需要括号内指定的模式的上方和下方。

So I need the output to be: 所以我需要的输出是:

<div color="r">
        <b>From:</b>xyz<br>
        <b>Sent:</b>xyz PM<br>
        <b>To:</b>xyz<br><br>
</div>

Thanks in advance.. 提前致谢..

It is not recommended to parse HTML using regex. 不建议使用正则表达式解析HTML。

If you know what you're doing then you can use following String#replaceAll call: 如果您知道自己在做什么,则可以使用以下String#replaceAll调用:

html.replaceAll
           ("(?i)(?s).*?(<div\\s*color.*?From:.*?Sent:.*?To:.*?</div>).*", "$1");

Try this. 尝试这个。 I'm expanding on my comment so you'll see what I mean: 我的评论在扩大,因此您将明白我的意思:

  public String findText(String htmlString) {
    Pattern patt = Pattern.compile("<div.*</div>");
      Matcher m = patt.matcher(htmlString);
      while (m.find()) {
        String text = m.group(1);
        // check whether the value of text is the div you want
        if (text.indexOf("color") < text.indexOf(">")) { //... or something similar
           return (text);
        }
      }
    return null;
   }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM