[英]How to use regex to match HTML tag contents in Java
What I want to accomplish:我想要完成的事情:
I want to match certain explicit content outside of comments.我想匹配评论之外的某些明确内容。
An Example:一个例子:
<div>
<div>Hello $world$</div>
<div>Another text <!-- $example$--></div>
</div>
<div>
How are $you$?
</div>
<!--
<div>
Lorem ipsum $dolor$ sit
</div>
-->
Words I want to match: $world$
, $you$
我想匹配的词: $world$
, $you$
Words I don't want to match: $example$
, $dolor$
我不想匹配的词: $example$
, $dolor$
So far I was only able to match either all or none.到目前为止,我只能匹配全部或不匹配。
What I can't do:我不能做什么:
I can't delete all comments because because it's required to provide the source code I filtered.我无法删除所有评论,因为需要提供我过滤的源代码。
I can't tell where you come from with your code, but you need to read your page into a String oder String[] and then run a regular expression over it to extract the Strings you want to filter.我不知道你的代码来自哪里,但你需要将你的页面读入一个字符串或字符串[],然后在它上面运行一个正则表达式来提取你想要过滤的字符串。
How to use a stream with regex in java:如何在java中使用带有正则表达式的流:
How do I create a Stream of regex matches? 如何创建正则表达式匹配流?
How to use regex in java:如何在java中使用正则表达式:
https://www.tutorialspoint.com/java/java_regular_expressions.htm https://www.tutorialspoint.com/java/java_regular_expressions.htm
Test your regular expression before deploying it:在部署之前测试您的正则表达式:
Add alternation添加交替
/(?:<!--.*?-->)|($.*?$)/gsm
^
and check if there is anything in the first capturing group.并检查第一个捕获组中是否有任何内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.