简体   繁体   中英

Java regular expression: avoiding logical operator

The following regular expression creates a StackOverflowError when applied on a large html page:

<li.*?>(.|\s)*?</li>

My hypothesis is that it is due to the logical "OR" operator ( | ) that creates recursive calls in the matcher and, due to the large html page size that needs to be parsed, it creates the stack overflow.

Is there any way I can rewrite this regular expression without the "OR " operator (knowing that I want to capture content that is potentially split over multiple lines, hence the need of \\s )?

Many thanks, Tom

The following uses DOT_ALL, (?:s) to let the dot . also match line break characters.

(?s)<li[^>]*>.*?</li>

Important however is that no back throw to the <li...> occurs, hence the variation I chose.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM