简体   繁体   English

无法从PHP的Feed中提取标题?

[英]Unable to extract title from feed in PHP?

I am trying to extract data from this feed . 我正在尝试从此提要中提取数据。 This is my code: 这是我的代码:

$xml = file_get_contents_curl($feed_url);
$rss  = new DOMDocument();
$rss->load($xml);

The function file_get_contents_curl get the data from webpage. 函数file_get_contents_curl从网页获取数据。 On dumping it in a var like this: 在像这样将其转储到var中时:

var_dump($xml);

It echoes everything as expected ( i mean all the title, link etc. tags). 它回显了所有预期的内容(我的意思是所有标题,链接等标记)。 However, if I use var_dump on $rss : 但是,如果我在$rss上使用var_dump

var_dump($rss);

I get this reponse: 我得到这个回应:

object(DOMDocument)#1 (34) { ["doctype"]=> NULL ["implementation"]=> string(22) "(object value omitted)" ["documentElement"]=> NULL ["actualEncoding"]=> NULL ["encoding"]=> NULL ["xmlEncoding"]=> NULL ["standalone"]=> bool(true) ["xmlStandalone"]=> bool(true) ["version"]=> string(3) "1.0" 
["xmlVersion"]=> string(3) "1.0" ["strictErrorChecking"]=> bool(true) ["documentURI"]=> NULL ["config"]=> NULL ["formatOutput"]=> bool(false) ["validateOnParse"]=> bool(false) ["resolveExternals"]=> bool(false) ["preserveWhiteSpace"]=> bool(true) 
["recover"]=> bool(false) ["substituteEntities"]=> bool(false) ["nodeName"]=> string(9) "#document" ["nodeValue"]=> NULL ["nodeType"]=> int(9) ["parentNode"]=> NULL ["childNodes"]=> string(22) "(object value omitted)" ["firstChild"]=> NULL 
["lastChild"]=> NULL ["previousSibling"]=> NULL ["attributes"]=> NULL ["ownerDocument"]=> NULL ["namespaceURI"]=> NULL ["prefix"]=> string(0) "" ["localName"]=> NULL ["baseURI"]=> NULL ["textContent"]=> string(0) "" } 

Now, I can't extract title or anything else from the feed. 现在,我无法从Feed中提取标题或其他内容。 My code is like this: 我的代码是这样的:

foreach ($rss->getElementsByTagName('item') as $node) {
  $title = $node->getElementsByTagName('title')->item(0)->nodeValue;

The feed, however, has an error if you open it up in chrome error on line 273 at column 11: Encoding error but it opens in Firefox. 但是,如果您error on line 273 at column 11: Encoding error Chrome error on line 273 at column 11: Encoding error打开了feed,则会出现错误error on line 273 at column 11: Encoding error但是在Firefox中将其打开。 But I guess I should be able to parse the feed up to first point of error. 但是我想我应该能够解析提要到错误的第一点。

Here is a sample of the feed: 这是提要的示例:

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
    <title>eBaum's World - Featured Media</title>
    <link>http://www.ebaumsworld.com</link>
    <atom:link href="http://www.ebaumsworld.com/rss/featured/" rel="self" type="application/rss+xml" />
    <description>The latest featured media</description>
    <language>en-us</language>
    <copyright>eBaum's World (c) 1998-2015</copyright>
    <lastBuildDate>Wed, 25 Nov 2015 03:31:12 -0500</lastBuildDate>
    <pubDate>Wed, 25 Nov 2015 03:31:12 -0500</pubDate>
            <item>
        <title>24 People Being Complete A$$holes</title>
        <link>http://www.ebaumsworld.com/pictures/view/84832600/</link>
        <description>
            <![CDATA[
            <table cellspacing="0" cellpadding="2" width="100%" border="0">
                <tr>
                    <td valign="top" width="120">
                        <a href="http://www.ebaumsworld.com/pictures/view/84832600/"><img width="320" height="220" src="http://cdn.ebaumsworld.com/thumbs/2015/11/24/070634/84832600/assholes.jpg" border="0" /></a>
                    </td>
                    <td valign="top">
                        People acting like such mega-jerks it might send you into a blind rage!                     </td>
                </tr>
            </table>
            ]]>
        </description>
        <pubDate>Tue, 24 Nov 2015 23:02:00 -0500</pubDate>
        <enclosure type="image/jpg" url="http://cdn.ebaumsworld.com/thumbs/2015/11/24/070634/84832600/assholes.jpg" length="10000"/>
        <guid isPermaLink="false">http://www.ebaumsworld.com/pictures/view/84832600/</guid>
    </item>

This is my function definition for file_get_contents_curl : 这是我对file_get_contents_curl函数定义:

function file_get_contents_curl($url) {
  $agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)';
  $ch = curl_init();
  curl_setopt($ch, CURLOPT_USERAGENT, $agent);
  curl_setopt($ch, CURLOPT_HEADER, 0);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

  $data = curl_exec($ch);
  curl_close($ch);

  return $data;
}

Just use SimpleXMLElement and access the xml nodes. 只需使用SimpleXMLElement并访问xml节点即可。

$xml = file_get_contents_curl($feed_url);
$x = new SimpleXMLElement($xml);

foreach ($x as $node) {
  print $node->title . PHP_EOL;
  print $node->description . PHP_EOL;
}

will output 将输出

eBaum's World - Featured Media
The latest featured media

With your url to the actual XML feed, you can simply load it via simplexml_load_file() : 使用实际XML提要的url,您可以简单地通过simplexml_load_file()加载它:

<?php
$str = '<?xml version="1.0" encoding="UTF-8"?>
$url = "http://feeds.feedburner.com/ebaumsworld/aUjW":
$xml = simplexml_load_file($url);
foreach ($xml->channel->item as $item)
    echo $item->title;
// possible output:24 People Being Complete A$$holes
?>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM