[英]What's wrong with my PHP regex?
I'm trying to pull a specific link from a feed where all of the content is on one line and there are multiple links present. 我正在尝试从供稿中提取特定链接,其中所有内容都在一行上,并且存在多个链接。 The one I want has the content of "[link]" in the the A tag.
我想要的一个在A标记中具有“ [link]”的内容。 Here's my example:
这是我的示例:
<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>
... could be more links before and/or after
How do I isolate just the href with the content "[link]"? 如何仅将带有内容“ [link]”的href隔离?
This regex goes to the correct end of the block I want, but starts at the first link: 此正则表达式转到我想要的块的正确末端,但从第一个链接开始:
(?<=href\=\").*?(?=\[link\])
Any help would be greatly appreciated! 任何帮助将不胜感激! Thanks.
谢谢。
Alternatively :) 或者:)
This code : 此代码:
$string = '<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>';
$regex = '/href="([^"]*)">\[link\]/i';
$result = preg_match($regex, $string, $matches);
var_dump($matches);
Will return : 将返回 :
array(2) {
[0] =>
string(41) "href="http://www.amazingpage.com/">[link]"
[1] =>
string(27) "http://www.amazingpage.com/"
}
You can avoid using regular expression and use DOM to do this. 您可以避免使用正则表达式,而可以使用DOM来执行此操作。
$doc = DOMDocument::loadHTML('
<a href="google.com/">test1</a>
<a href="google.com/">test2</a>
<a href="http://www.amazingpage.com/">[link]</a>
<a href="google.com/">test3</a>
<a href="google.com/">test4</a>
');
foreach ($doc->getElementsByTagName('a') as $link) {
if ($link->nodeValue == '[link]') {
echo $link->getAttribute('href');
}
}
With DOMDocument and XPath: 使用DOMDocument和XPath:
$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);
foreach ($xpath->query('//a[. = "[link]"]/@href') as $node) {
echo $node->nodeValue;
}
or if you are looking for only one result: 或者,如果您只寻找一个结果:
$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);
$nodeList = $xp->query('//a[. = "[link]"][1]/@href');
if ($nodeList->length)
echo $nodeList->item(0)->nodeValue;
xpath query details: xpath查询详细信息:
//a # 'a' tag everywhere in the DOM tree
[. = "[link]"] # (condition) which has "[link]" as value
/@href # "href" attribute
The reason your regex pattern doesn't work: 您的正则表达式模式不起作用的原因:
The regex engine walks from left to right and for each position in the string it tries to succeed. 正则表达式引擎从左向右移动,并尝试在字符串中的每个位置成功。 So, even if you use a non-greedy quantifier, you obtain always the leftmost result.
因此,即使您使用非贪婪的量词,也始终会获得最左边的结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.