正则表达式锚标记

Question

i am using php and i am having problem to parse the href from anchor tag with text. 我正在使用PHP，我有问题从锚标记解析href与文本。

example: anchor tag having test http://www.test.com 示例：锚标签有测试http://www.test.com

like this <a href="http://www.test.com" title="test">http://www.test.com</a> 像这样的<a href="http://www.test.com" title="test">http://www.test.com</a>

i want to match all text in anchor tag 我想匹配锚标记中的所有文本

thanks in advance. 提前致谢。

Answer 1

Use DOM : 使用DOM ：

$text = '<a href="http://www.test.com" title="test">http://www.test.com</a> something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. DOM专门用于解析XML和HTML。 It will be more robust than any regex solution you can come up with. 它将比您提出的任何正则表达式解决方案更强大。

Answer 2

Assuming you wish to select the link text of an anchor link with that href, then something like this should work... 假设您希望选择具有该href的锚链接的链接文本，那么这样的事情应该有效......

$input = '<a href="http://www.test.com" title="test">http://www.test.com</a>';
$pattern = '#<a href="http://www\.test\.com"[^>]*>(.*?)</a>#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. 这在技术上并不完美（理论上>可以在其中一个标签中使用），但在99％的情况下都可以使用。 As several of the comments have mentioned though, you should be using a DOM. 正如几条评论所提到的，你应该使用DOM。

Answer 3

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough: 如果您已经获得了锚标记，则可以通过正则表达式轻松提取href属性：

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. 如果您想要提取标记的内容并且知道自己在做什么，那么编写一个简单的递归下降解析器（使用级联正则表达式）并不难，它将解析除最多病态情况之外的所有情况。 Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how. 不幸的是，PHP不是学习如何做到这一点的好语言，所以我不建议使用这个项目来学习如何。

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. 因此，如果它是你所追求的内容，而不是属性，那么@ katrielalex是正确的：不要用正则表达式解析HTML。 You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions. 您将遇到嵌套格式标签和其他与正则表达式不兼容的合法HTML，从而陷入伤害的世界。

正则表达式锚标记

问题描述

3 个解决方案

解决方案1
6 2010-07-29 10:10:07

解决方案2
-1 2010-07-29 10:09:14

解决方案3
-1 2010-07-29 10:09:41

正则表达式锚标记

问题描述

3 个解决方案

解决方案1 6 2010-07-29 10:10:07

解决方案2 -1 2010-07-29 10:09:14

解决方案3 -1 2010-07-29 10:09:41

解决方案1
6 2010-07-29 10:10:07

解决方案2
-1 2010-07-29 10:09:14

解决方案3
-1 2010-07-29 10:09:41