简体   繁体   English

正则表达式锚标记

[英]regular expression anchor tag

i am using php and i am having problem to parse the href from anchor tag with text. 我正在使用PHP,我有问题从锚标记解析href与文本。

example: anchor tag having test http://www.test.com 示例:锚标签有测试http://www.test.com

like this <a href="http://www.test.com" title="test">http://www.test.com</a> 像这样的<a href="http://www.test.com" title="test">http://www.test.com</a>

i want to match all text in anchor tag 我想匹配锚标记中的所有文本

thanks in advance. 提前致谢。

Use DOM : 使用DOM

$text = '<a href="http://www.test.com" title="test">http://www.test.com</a> something else hello world';
$dom = new DOMDocument();
$dom->loadHTML($text);

foreach ($dom->getElementsByTagName('a') as $a) {
    echo $a->textContent;
}

DOM is specifically designed to parse XML and HTML. DOM专门用于解析XML和HTML。 It will be more robust than any regex solution you can come up with. 它将比您提出的任何正则表达式解决方案更强大。

Assuming you wish to select the link text of an anchor link with that href, then something like this should work... 假设您希望选择具有该href的锚链接的链接文本,那么这样的事情应该有效......

$input = '<a href="http://www.test.com" title="test">http://www.test.com</a>';
$pattern = '#<a href="http://www\.test\.com"[^>]*>(.*?)</a>#';

if (preg_match($pattern, $input, $out)) {
    echo $out[1];
}

This is technically not perfect (in theory > can probably be used in one of the tags), but will work in 99% of cases. 这在技术上并不完美(理论上>可以在其中一个标签中使用),但在99%的情况下都可以使用。 As several of the comments have mentioned though, you should be using a DOM. 正如几条评论所提到的,你应该使用DOM。

If you have already obtained the anchor tag you can extract the href attribute via a regex easily enough: 如果您已经获得了锚标记,则可以通过正则表达式轻松提取href属性:

<a [^>]*href="([^"])"[^>]*>

If you instead want to extract the contents of the tag and you know what you are doing, it isn't too hard to write a simple recursive descent parser, using cascading regexes, that will parse all but the most pathological cases. 如果您想要提取标记的内容并且知道自己在做什么,那么编写一个简单的递归下降解析器(使用级联正则表达式)并不难,它将解析除最多病态情况之外的所有情况。 Unfortunately PHP isn't a good language to learn how to do this, so I wouldn't recommend using this project to learn how. 不幸的是,PHP不是学习如何做到这一点的好语言,所以我不建议使用这个项目来学习如何。

So if it is the contents you are after, not the attribute, then @katrielalex is right: don't parse HTML with regex. 因此,如果它是你所追求的内容,而不是属性,那么@ katrielalex是正确的:不要用正则表达式解析HTML。 You will run into a world of hurt with nested formatting tags and other legal HTML that isn't compatible with regular expressions. 您将遇到嵌套格式标签和其他与正则表达式不兼容的合法HTML,从而陷入伤害的世界。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM