What's the quickest way to strip a string from a specific tag

Question

I have HTML in a string. I want to strip the <head> part of it. I use:

$html = preg_replace("/<head[^>]*?>.*?<\/head>/s", "", $html);

But in terms of performance, this can be a bit heavy. Is there a better alternative?

I know that I can use strip_tags() and list all accepted tags in the second argument but it's too many to list.

Answer 1

Your current regex takes 6720 steps when tested against part of this SO page.

This regex <head[^>]*?>(?:[^<]*<??)*</head> only takes 376 steps, and it should return the same thing. It should be almost 20x faster than your regex.

It works by greedily matching everything that's not < here: [^<]*

Then, because <?? is lazy, it will try to immediately match </head> . If there is no match, the <?? kicks in.

What's the quickest way to strip a string from a specific tag

Question

1 answers

solution1
0 2016-04-19 17:08:51

What's the quickest way to strip a string from a specific tag

Question

1 answers

solution1 0 2016-04-19 17:08:51

solution1
0 2016-04-19 17:08:51