The following regular expression creates a StackOverflowError when applied on a large html page:
<li.*?>(.|\s)*?</li>
My hypothesis is that it is due to the logical "OR" operator ( |
) that creates recursive calls in the matcher and, due to the large html page size that needs to be parsed, it creates the stack overflow.
Is there any way I can rewrite this regular expression without the "OR " operator (knowing that I want to capture content that is potentially split over multiple lines, hence the need of \\s
)?
Many thanks, Tom
The following uses DOT_ALL, (?:s)
to let the dot .
also match line break characters.
(?s)<li[^>]*>.*?</li>
Important however is that no back throw to the <li...>
occurs, hence the variation I chose.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.