简体   繁体   English

Java中如何使用正则表达式匹配HTML标签内容

[英]How to use regex to match HTML tag contents in Java

What I want to accomplish:我想要完成的事情:

I want to match certain explicit content outside of comments.我想匹配评论之外的某些明确内容。

An Example:一个例子:

<div>
    <div>Hello $world$</div>
    <div>Another text <!-- $example$--></div>   
</div>
<div>
    How are $you$?
</div>
<!-- 
<div>
    Lorem ipsum $dolor$ sit
</div>
-->

Words I want to match: $world$ , $you$我想匹配的词: $world$ , $you$

Words I don't want to match: $example$ , $dolor$我不想匹配的词: $example$ , $dolor$

So far I was only able to match either all or none.到目前为止,我只能匹配全部或不匹配。

What I can't do:我不能做什么:

I can't delete all comments because because it's required to provide the source code I filtered.我无法删除所有评论,因为需要提供我过滤的源代码。

I can't tell where you come from with your code, but you need to read your page into a String oder String[] and then run a regular expression over it to extract the Strings you want to filter.我不知道你的代码来自哪里,但你需要将你的页面读入一个字符串或字符串[],然后在它上面运行一个正则表达式来提取你想要过滤的字符串。

How to use a stream with regex in java:如何在java中使用带有正则表达式的流:

How do I create a Stream of regex matches? 如何创建正则表达式匹配流?

How to use regex in java:如何在java中使用正则表达式:

https://www.tutorialspoint.com/java/java_regular_expressions.htm https://www.tutorialspoint.com/java/java_regular_expressions.htm

Test your regular expression before deploying it:在部署之前测试您的正则表达式:

https://regexr.com/ https://regexr.com/

Add alternation添加交替

/(?:<!--.*?-->)|($.*?$)/gsm
               ^

and check if there is anything in the first capturing group.并检查第一个捕获组中是否有任何内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM