[英]Java Regex: Match text between two strings strictly inner block
I need a regex which matches between two strings but takes the inner block only. 我需要一个正则表达式,它可以在两个字符串之间进行匹配,但是仅使用内部块。 I tried using reluctant quantifier but it did not work. 我尝试使用勉强的量词,但没有用。
Here is an example: 这是一个例子:
<div>
Hi
</div>
<div class = "quote">
This is mail.
<hr tabindex="-1">
<div color="r">
<b>From:</b>xyz<br>
<b>Sent:</b>xyz PM<br>
<b>To:</b>xyz<br><br>
</div>
</div>
I used this regex but it did not work (with DOTALL matching, so that "." matches newline as well) 我使用了此正则表达式,但不起作用(与DOTALL匹配,因此“。”也与换行符匹配)
<div.*(From:.*Sent:.*To:.*)*?</div>
Above regex is matching everything since the input text starts with <div>
and ends with </div>
, but I need the just above and below the pattern specified inside the bracket. 由于输入文本以<div>
开头和</div>
结束,因此regex上面的内容可以匹配所有内容,但是我需要括号内指定的模式的上方和下方。
So I need the output to be: 所以我需要的输出是:
<div color="r">
<b>From:</b>xyz<br>
<b>Sent:</b>xyz PM<br>
<b>To:</b>xyz<br><br>
</div>
Thanks in advance.. 提前致谢..
It is not recommended to parse HTML using regex. 不建议使用正则表达式解析HTML。
If you know what you're doing then you can use following String#replaceAll
call: 如果您知道自己在做什么,则可以使用以下String#replaceAll
调用:
html.replaceAll
("(?i)(?s).*?(<div\\s*color.*?From:.*?Sent:.*?To:.*?</div>).*", "$1");
Try this. 尝试这个。 I'm expanding on my comment so you'll see what I mean: 我的评论在扩大,因此您将明白我的意思:
public String findText(String htmlString) {
Pattern patt = Pattern.compile("<div.*</div>");
Matcher m = patt.matcher(htmlString);
while (m.find()) {
String text = m.group(1);
// check whether the value of text is the div you want
if (text.indexOf("color") < text.indexOf(">")) { //... or something similar
return (text);
}
}
return null;
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.