[英]Extract href from html page using php
我尝试使用下面的代码提取新闻标题和每个标题的链接(href),但是链接提取不起作用。 这只是标题。 请帮助我找出代码有什么问题。
链接到我要从其获得标题的页面,并链接到: http : //web.tmxmoney.com/news.php?qm_symbol=BCM
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
@$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols = $row->getElementsByTagName('span');
$newstitle = $cols->item(0)->nodeValue;
$link = $cols->item(0)->nodeType === HTML_ELEMENT_NODE ? $cols->item(0)->getElementsByTagName('a')->item(0)->getAttribute('href') : '';
echo $newstitle . '<br>';
echo $link . '<br><br>';
}
?>
在此先感谢您的帮助!
尝试这样做:
<?php
$data= file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new DOMDocument();
@$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$hrefs= $xpath->query('/html/body//a');
for($i = 0; $i < $hrefs->length; $i++){
$href = $hrefs->item($i);
$url = $href->getAttribute('href');
$url = filter_var($url, FILTER_SANITIZE_URL);
if(!filter_var($url, FILTER_VALIDATE_URL) === false){
echo '<a href="'.$url.'">'.$url.'</a><br />';
}
}
?>
我找到了解决方案。 它去了:
<?php
$data = file_get_contents('http://web.tmxmoney.com/news.php?qm_symbol=BCM');
$dom = new domDocument;
@$dom->loadHTML($data);
$dom->preserveWhiteSpace = true;
$xpath = new DOMXPath($dom);
$rows = $xpath->query('//div');
foreach ($rows as $row) {
$cols1 = $row->getElementsByTagName('a');
$link = $cols1->item(0)->nodeType === XML_ELEMENT_NODE ? $cols1->item(0)->getAttribute('href') : '';
$cols2 = $row->getElementsByTagName('span');
$title = $cols2->item(0)->nodeValue;
$source = $cols2->item(1)->nodeValue;
echo $title . '<br>';
echo $source . '<br>';
echo $link . '<br><br>';
}
?>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.