PHP regular expressions to clean duplicated HTML tags

Question

I am trying to get a regular expression to work, but not having a whole lot of luck.

the source file I am reading(poorly formatted, but nothing I can do there) has the following in its source between elements

<BR>
<BR>
<BR>

how do I match this with a php regular expression?

Answer 1

Something like this:

preg_match('/(<br>\s*){3}/i', $str, $matches);

This is a bit more lenient than your example - it does a case-insensitive match and matches any whitespace between the <br> s, not just newlines.

To match 3 or more instead of 3:

preg_match('/(<br>\s*){3,}/i', $str, $matches);

Answer 2

If you just want to replace the <BR> instances then you're better off doing a string replacement. It is a lot faster then regex.

$newstr = str_replace('<BR>', 'replacement...', $str);

Answer 3

My take on it

<?php

$html = <<<HTML
<BR>
<BR>
<BR>
<p>^^ Replace 3 consecutive BR tags with nothing</p>
<BR>
<BR>
<p>^^ those should stay, there's only 2 of them</p>
<BR>
  <BR>


      <BR>
<p>^^ But those should go, whitespace and newlines shouldn't matter
HTML;

echo preg_replace( "/(?:<br>\s*){3}/i", '', $html );

PHP regular expressions to clean duplicated HTML tags

Question

3 answers

solution1
5 ACCPTED 2009-09-01 19:22:03

solution2
3 2009-09-01 19:36:29

solution3
1 2009-09-01 19:39:35

PHP regular expressions to clean duplicated HTML tags

Question

3 answers

solution1 5 ACCPTED 2009-09-01 19:22:03

solution2 3 2009-09-01 19:36:29

solution3 1 2009-09-01 19:39:35

solution1
5 ACCPTED 2009-09-01 19:22:03

solution2
3 2009-09-01 19:36:29

solution3
1 2009-09-01 19:39:35