不成熟的组捕获PHP正则表达式

Question

I have HTML stored in a MySQL database that I am migrating to a new WordPress installation from Joomla. 我将HTML存储在MySQL数据库中，该数据库正从Joomla迁移到新的WordPress安装中。 I need to remove some caption text at the bottom of each page. 我需要删除每页底部的一些标题文本。

An example of the HTML: HTML的示例：

<a href="some/link">link 1</a><p>some really long description</p><a href="another/link">link 2</a>CAPTION TEXT HERE[/caption]

I am using a PHP script to query the database and do the regex matching. 我正在使用PHP脚本查询数据库并进行正则表达式匹配。

My regex thus far: 到目前为止，我的正则表达式：

/(<\/a>)(.*?)(\[\/caption\])/

I need to remove the 2nd caption group (CAPTION TEXT HERE) entirely, so in essence replacing Groups 1,2 and 3 with Groups 1 and 3. Group 2 can contain any alphanumeric or special character. 我需要完全删除第二个字幕组（CAPTION TEXT HERE），因此从本质上讲，将组1,2和3替换为组1和3。组2可以包含任何字母数字或特殊字符。

The problem I am running into is that capture group 1 is matching the closing anchor tag for link 1 and continuing until the [/caption] 我遇到的问题是捕获组1与链接1的结束锚标记匹配，并一直持续到[/caption]

What happens is: 发生的是：

</a><p>some really long description</p><a href="another/link">link 2</a>CAPTION TEXT HERE[/caption]

gets replaced with: 被替换为：

<a href="some/link">link 1</a>[/caption]

when what I really need is: 当我真正需要的是：

<a href="some/link">link 1</a><p>some really long description</p><a href="another/link">link 2</a>[/caption]

Thank you in advance! 先感谢您！

Answer 1

Male it to not include > in matched text 在匹配的文本中不包含>

(<\/a>)([^>]*?)(\[\/caption\])

Demo 演示

不成熟的组捕获PHP正则表达式

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-06-24 17:16:21

不成熟的组捕获PHP正则表达式

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-06-24 17:16:21

解决方案1
1 已采纳 2015-06-24 17:16:21