简体   繁体   中英

Premature group capturing PHP regex

I have HTML stored in a MySQL database that I am migrating to a new WordPress installation from Joomla. I need to remove some caption text at the bottom of each page.

An example of the HTML:

<a href="some/link">link 1</a><p>some really long description</p><a href="another/link">link 2</a>CAPTION TEXT HERE[/caption]

I am using a PHP script to query the database and do the regex matching.

My regex thus far:

/(<\/a>)(.*?)(\[\/caption\])/

I need to remove the 2nd caption group (CAPTION TEXT HERE) entirely, so in essence replacing Groups 1,2 and 3 with Groups 1 and 3. Group 2 can contain any alphanumeric or special character.

The problem I am running into is that capture group 1 is matching the closing anchor tag for link 1 and continuing until the [/caption]

What happens is:

</a><p>some really long description</p><a href="another/link">link 2</a>CAPTION TEXT HERE[/caption]

gets replaced with:

<a href="some/link">link 1</a>[/caption]

when what I really need is:

<a href="some/link">link 1</a><p>some really long description</p><a href="another/link">link 2</a>[/caption]

Thank you in advance!

Male it to not include > in matched text

(<\/a>)([^>]*?)(\[\/caption\])

Demo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM