从特定标签中剥离字符串的最快方法是什么

Question

I have HTML in a string. 我在字符串中有HTML。 I want to strip the <head> part of it. 我想剥去它的<head>部分。 I use: 我用：

$html = preg_replace("/<head[^>]*?>.*?<\/head>/s", "", $html);

But in terms of performance, this can be a bit heavy. 但就性能而言，这可能有点沉重。 Is there a better alternative? 还有更好的选择吗？

I know that I can use strip_tags() and list all accepted tags in the second argument but it's too many to list. 我知道我可以使用strip_tags()并在第二个参数中列出所有接受的标签，但列出的内容太多了。

Answer 1

Your current regex takes 6720 steps when tested against part of this SO page. 当针对此SO页面的一部分进行测试时，您当前的正则表达式需要6720步。

This regex <head[^>]*?>(?:[^<]*<??)*</head> only takes 376 steps, and it should return the same thing. 这个正则表达式<head[^>]*?>(?:[^<]*<??)*</head>只需要376步，它应该返回相同的东西。 It should be almost 20x faster than your regex. 它应该比你的正则表达式快近20倍。

It works by greedily matching everything that's not < here: [^<]* 它的工作原理是贪婪地匹配不是< here： [^<]*

Then, because <?? 那么，因为<?? is lazy, it will try to immediately match </head> . 很懒，它会尝试立即匹配</head> 。 If there is no match, the <?? 如果没有匹配， <?? kicks in. 踢进来。

从特定标签中剥离字符串的最快方法是什么

问题描述

1 个解决方案

解决方案1
0 2016-04-19 17:08:51

从特定标签中剥离字符串的最快方法是什么

问题描述

1 个解决方案

解决方案1 0 2016-04-19 17:08:51

解决方案1
0 2016-04-19 17:08:51