The following PHP code uses cURL, XPath and displays all the links on a certain page ($target_url).
** What I'm trying to do is figure out how to display only the the anchor text (the linked words in an href) on a given page when I supply the website value.
For example...I want to search "randomwebsite.com" to see if there is a link with my target_url (ex. ebay.com) and display just the anchor text of "auction website"
http://www.ebay.com'>auction website
<?php
$target_url = "http://www.ebay.com";
$userAgent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)';
// make the cURL request to $target_url
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_URL,$target_url);
curl_setopt($ch, CURLOPT_FAILONERROR, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$html= curl_exec($ch);
if (!$html) {
echo "<br />cURL error number:" .curl_errno($ch);
echo "<br />cURL error:" . curl_error($ch);
exit;
}
// parse the html into a DOMDocument
$dom = new DOMDocument();
@$dom->loadHTML($html);
// grab all the on the page
$xpath = new DOMXPath($dom);
$hrefs = $xpath->query('/html/body//a');
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
echo "<br />Link: $url";
}
?>
You would get the text with $href->nodeValue
inside your example loop. That doesn't really account for what you may want to do if it's an image tag or such though, but I think this is what you were specifically asking.
not sure whether I got the point of what you're asking for... but is maybe this what you want to implement?
$url_matches = array('www.ebay.com' => 'Auction Site',
'www.google.com' =>'Search Engine'
);
for ($i = 0; $i < $hrefs->length; $i++) {
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
if (in_array($url, $url_matches)) {
$url = $url_matches[$url];
}
echo "<br />Link: $url";
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.