简体   繁体   中英

Finding *two* html tags with Regular Expressions

I need to pull out the content out of two paragraph tags and break it with a <br /> tag. The input is like so

<p>
Yay
</p>
<p>
StackOverFlow
</p>

It needs to be like

<p>
Yay <br />
StackOverflow
</p>

What I have so far is <p><?php preg_match('/<p>(.*)<\\/p>/', $content, $match); echo($match[1])."..."; ?></p> <p><?php preg_match('/<p>(.*)<\\/p>/', $content, $match); echo($match[1])."..."; ?></p> <p><?php preg_match('/<p>(.*)<\\/p>/', $content, $match); echo($match[1])."..."; ?></p> Which pulls the first paragraph tag only:

<p>
Yay...
</p>

Also, is it possible to set a character limit? A max of 40 characters for example from both of the paragraphs or would I have to use substr ?

Thanks!

So it turned out to be:

<?php $content = preg_replace('/<\/p>\s*<p>/', '<br/>', $content);  echo substr("$content",0,180)."..."; ?>

Do yourself a favor and use a HTML parser ( DOMDocument::loadHTML for example). It's easier and less fragile.

I think you're making it more complicated than it needs to be. Given that you want to collapse:

<p>Yay</p><p>StackOverFlow</p>

into:

<p>Yay<br />StackOverflow</p>

Then just substitute instances of </p><p> for <br> : preg_replace('/<\\/p>\\s*<p>/', '<br/>', $input) .


In general, however, note that use of regular expressions for this kind of complex parsing is fraught with peril. More succinctly:

"Some people, when faced with a problem, think, 'I know, I'll use regular expressions.' Now they have two problems." -- Jamie Zawinski

My advice, Regex can only go so far. See one of my posts here: Extracting text fragment from a HTML body (in .NET)

It has string truncation regex too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM