I'm trying to pull a specific link from a feed where all of the content is on one line and there are multiple links present. The one I want has the content of "[link]" in the the A tag. Here's my example:
<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>
... could be more links before and/or after
How do I isolate just the href with the content "[link]"?
This regex goes to the correct end of the block I want, but starts at the first link:
(?<=href\=\").*?(?=\[link\])
Any help would be greatly appreciated! Thanks.
Try this updated regex:
(?<=href\=\")[^<]*?(?=\">\[link\])
See demo . The problem is that the dot matches too many characters and in order to get the right 'href' you need to just restrict the regex to [^<]*?
.
Alternatively :)
This code :
$string = '<a href="google.com/">test1</a> <a href="google.com/">test2</a> <a href="http://www.amazingpage.com/">[link]</a> <a href="google.com/">test3</a><a href="google.com/">test4</a>';
$regex = '/href="([^"]*)">\[link\]/i';
$result = preg_match($regex, $string, $matches);
var_dump($matches);
Will return :
array(2) {
[0] =>
string(41) "href="http://www.amazingpage.com/">[link]"
[1] =>
string(27) "http://www.amazingpage.com/"
}
You can avoid using regular expression and use DOM to do this.
$doc = DOMDocument::loadHTML('
<a href="google.com/">test1</a>
<a href="google.com/">test2</a>
<a href="http://www.amazingpage.com/">[link]</a>
<a href="google.com/">test3</a>
<a href="google.com/">test4</a>
');
foreach ($doc->getElementsByTagName('a') as $link) {
if ($link->nodeValue == '[link]') {
echo $link->getAttribute('href');
}
}
With DOMDocument and XPath:
$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);
foreach ($xpath->query('//a[. = "[link]"]/@href') as $node) {
echo $node->nodeValue;
}
or if you are looking for only one result:
$dom = DOMDOcument::loadHTML($yourHTML);
$xpath = DOMXPath($dom);
$nodeList = $xp->query('//a[. = "[link]"][1]/@href');
if ($nodeList->length)
echo $nodeList->item(0)->nodeValue;
xpath query details:
//a # 'a' tag everywhere in the DOM tree
[. = "[link]"] # (condition) which has "[link]" as value
/@href # "href" attribute
The reason your regex pattern doesn't work:
The regex engine walks from left to right and for each position in the string it tries to succeed. So, even if you use a non-greedy quantifier, you obtain always the leftmost result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.