简体   繁体   English

字符串结尾正则表达式匹配太慢

[英]End-of-string regex match too slow

Demo here . 在这里演示 The regex: 正则表达式:

([^>]+)$

I want to match text at the end of a HTML snippet that is not contained in a tag (ie, a trailing text node). 我想匹配标签中未包含的HTML片段末尾的文本(即尾随文本节点)。 The regex above seems like the simplest match, but the execution time seems to scale linearly with the length of the match-text (and has causes hangs in the wild when used in my browser extension). 上面的正则表达式似乎是最简单的匹配,但执行时间似乎与匹配文本的长度呈线性关系(并且在我的浏览器扩展中使用时会导致挂起)。 It's also equally slow for matching and non-matching text. 对于匹配和不匹配的文本,它也同样慢。

Why is this seemingly simple regex so bad? 为什么这个看似简单的正则表达式如此糟糕?

(I also tried RegexBuddy but can't seem to get an explanation from it.) (我也尝试过RegexBuddy,但似乎无法从中获得解释。)

Edit: Here's a snippet for testing the various regexes (click "Run" in the console area). 编辑:这是一个用于测试各种正则表达式的片段 (单击控制台区域中的“运行”)。
Edit 2: And a no-match test . 编辑2:和不匹配测试

Consider an input like this 考虑这样的输入

abc<def>xyz

With your original expression, ([^>]+)$ , the engine starts from a , fails on > , backtracks, restarts from b , then from c etc. So yes, the time grows with size of the input. 使用原始表达式([^>]+)$ ,引擎从a启动,失败启动> ,回溯,从b重新启动,然后从c启动等等。是的,时间随着输入的大小而增长。 If, however, you force the engine to consume everything up to the rightmost > first, as in: 但是,如果你强迫发动机消耗的一切行动,以最右边>第一,如:

.+>([^>]+)$

the backtracking will be limited by the length of the last segment, no matter how much input is before it. 无论前面有多少输入,回溯都将受到最后一段的长度的限制。

The second expression is not equivalent to the first one, but since you're using grouping, it doesn't matter much, just pick matches[1] . 第二个表达式不等同于第一个表达式,但由于您使用的是分组,因此只需选择matches[1]

Hint: even when you target javascript, switch to the pcre mode, which gives you access to the step info and debugger: 提示:即使你定位javascript,也可以切换到pcre模式,这样你就可以访问步骤信息和调试器了:

在此输入图像描述

(look at the green bars!) (看看绿色吧!)

You could use the actual DOM instead of Regex, which is time consuming: 您可以使用实际的DOM而不是Regex,这很耗时:

 var html = "<div><span>blabla</span></div><div>bla</div>Here I am !"; var temp = document.createElement('div'); temp.innerHTML = html; var lastNode = temp.lastChild || false; if(lastNode.nodeType == 3){ alert(lastNode.nodeValue); } 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM