简体   繁体   中英

PHP regex: remove everything between the last occurrence of <br> and a string

In a text, I want to remove everything between the last occurrence of <br> and a string.

Let's say I have this text:

Lorem ipsum dolor sit amet, <br> 
consectetur adipisicing elit, <br> 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <br> 
Some text I want to remove because it is useless.

I want to remove everything between the last <br> and "useless." (including the delimiters).

The expected result would be:

Lorem ipsum dolor sit amet, <br> 
consectetur adipisicing elit, <br> 
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
$text = <<< EOD
Lorem ipsum dolor sit amet, <br>
consectetur adipisicing elit, <br>
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <br>
Some text I want to remove because it is useless.
EOD;

echo(preg_replace('/(?s)<br>(?!.*<br>).*useless/', '', $text));

Above code prints:

Lorem ipsum dolor sit amet, <br>
consectetur adipisicing elit, <br>
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. .

Use negative lookahead lookup (?!.*<br>) to find last <br> .

$modified_text = preg_replace('/^(.*)<br>(.*)$/s', '$1', $original_text);

这应该创建一个modified_text变量,该变量包含original_text <br>中最后一个<br>所有内容。

You can do it like this:

$txt = preg_replace('~<br>(?>[^<u]++|<(?!br>)|u(?!seless))*(?>useless\.?|$)~', '', $txt);

interest:

  • few backtracks
  • few lookahead tests (only when a < or an u is find)
  • dotall mode is useless

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM