简体   繁体   中英

PHP regular expressions to clean duplicated HTML tags

I am trying to get a regular expression to work, but not having a whole lot of luck.

the source file I am reading(poorly formatted, but nothing I can do there) has the following in its source between elements

<BR>
<BR>
<BR>

how do I match this with a php regular expression?

Something like this:

preg_match('/(<br>\s*){3}/i', $str, $matches);

This is a bit more lenient than your example - it does a case-insensitive match and matches any whitespace between the <br> s, not just newlines.

To match 3 or more instead of 3:

preg_match('/(<br>\s*){3,}/i', $str, $matches);

If you just want to replace the <BR> instances then you're better off doing a string replacement. It is a lot faster then regex.

$newstr = str_replace('<BR>', 'replacement...', $str);

My take on it

<?php

$html = <<<HTML
<BR>
<BR>
<BR>
<p>^^ Replace 3 consecutive BR tags with nothing</p>
<BR>
<BR>
<p>^^ those should stay, there's only 2 of them</p>
<BR>
  <BR>


      <BR>
<p>^^ But those should go, whitespace and newlines shouldn't matter
HTML;

echo preg_replace( "/(?:<br>\s*){3}/i", '', $html );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM