简体   繁体   中英

End-of-string regex match too slow

Demo here . The regex:

([^>]+)$

I want to match text at the end of a HTML snippet that is not contained in a tag (ie, a trailing text node). The regex above seems like the simplest match, but the execution time seems to scale linearly with the length of the match-text (and has causes hangs in the wild when used in my browser extension). It's also equally slow for matching and non-matching text.

Why is this seemingly simple regex so bad?

(I also tried RegexBuddy but can't seem to get an explanation from it.)

Edit: Here's a snippet for testing the various regexes (click "Run" in the console area).
Edit 2: And a no-match test .

Consider an input like this

abc<def>xyz

With your original expression, ([^>]+)$ , the engine starts from a , fails on > , backtracks, restarts from b , then from c etc. So yes, the time grows with size of the input. If, however, you force the engine to consume everything up to the rightmost > first, as in:

.+>([^>]+)$

the backtracking will be limited by the length of the last segment, no matter how much input is before it.

The second expression is not equivalent to the first one, but since you're using grouping, it doesn't matter much, just pick matches[1] .

Hint: even when you target javascript, switch to the pcre mode, which gives you access to the step info and debugger:

在此输入图像描述

(look at the green bars!)

You could use the actual DOM instead of Regex, which is time consuming:

 var html = "<div><span>blabla</span></div><div>bla</div>Here I am !"; var temp = document.createElement('div'); temp.innerHTML = html; var lastNode = temp.lastChild || false; if(lastNode.nodeType == 3){ alert(lastNode.nodeValue); } 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM