简体   繁体   中英

How to use regex to match HTML tag contents in Java

What I want to accomplish:

I want to match certain explicit content outside of comments.

An Example:

<div>
    <div>Hello $world$</div>
    <div>Another text <!-- $example$--></div>   
</div>
<div>
    How are $you$?
</div>
<!-- 
<div>
    Lorem ipsum $dolor$ sit
</div>
-->

Words I want to match: $world$ , $you$

Words I don't want to match: $example$ , $dolor$

So far I was only able to match either all or none.

What I can't do:

I can't delete all comments because because it's required to provide the source code I filtered.

I can't tell where you come from with your code, but you need to read your page into a String oder String[] and then run a regular expression over it to extract the Strings you want to filter.

How to use a stream with regex in java:

How do I create a Stream of regex matches?

How to use regex in java:

https://www.tutorialspoint.com/java/java_regular_expressions.htm

Test your regular expression before deploying it:

https://regexr.com/

Add alternation

/(?:<!--.*?-->)|($.*?$)/gsm
               ^

and check if there is anything in the first capturing group.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM