[英]Extract Content From HTML with PHP
这是我的HTML文件:
<html>
<head>
<link href='http://wendyandgabe.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/>
<link href='http://wendyandgabe.blogspot.com/' rel='canonical'/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>
我想从上面的html文件中提取href的url,其中type="application/rss+xml"
。 这怎么可能? 任何人都可以显示一些示例代码吗?
您可以使用
DomDocument http://php.net/manual/de/class.domdocument.php和
和
DomXPath http://de3.php.net/manual/de/class.domxpath.php
$html = <<<EOF
<html>
<head>
<link href='http://wendyandgabe.blogspot.com/favicon.ico' rel='icon' type='image/x-icon'/>
<link href='http://wendyandgabe.blogspot.com/' rel='canonical'/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>
EOF;
$xml = new DomDocument;
$xml->loadHTML($html);
//create a xpath instance
$xpath = new DomXpath($xml);
//query for <link type="application/rss+xml"> and use the first found item
$link = $xpath->query('//link[@type="application/rss+xml"]')->item(0);
var_dump($link->getAttribute('href'));
您可以尝试这个PHP类DOMDocument
使用PHP Simple HTML DOM Parser ,方法如下:
// includes Simple HTML DOM Parser
include "simple_html_dom.php";
$text = '<html>
<head>
<link href="http://wendyandgabe.blogspot.com/favicon.ico" rel="icon" type="image/x-icon"/>
<link href="http://wendyandgabe.blogspot.com/" rel="canonical"/>
<link rel="alternate" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://wendyandgabe.blogspot.com/feeds/posts/default" />
<link rel="alternate" type="application/rss+xml" title="O' Happy Day! - RSS" href="http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss" />
<link rel="service.post" type="application/atom+xml" title="O' Happy Day! - Atom" href="http://www.blogger.com/feeds/5390468261501503598/posts/default" />
</head>
<body>
</body>
</html>';
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($text);
// Find the link with the appropriate selectors
$link = $html->find('link[type=application/rss+xml]', 0);
// Find succeeded
if ($link) {
$href = $link->href;
echo $href;
}
else
echo "Find function failed !";
// Clear DOM object (needed essentially when using many)
$html->clear();
unset($html);
OUTPUT
======
http://wendyandgabe.blogspot.com/feeds/posts/default?alt=rss
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.