简体   繁体   English

PHP无法使用cURL正确解析rss

[英]PHP Not parsing rss using cURL properly

i just want to get the name of 'channel' tag ie CHANNEL...the script works fine when i use it to parse the rss from Google..............but when i use it for some other provider it gives an output '#text' instead of giving 'channel' which is the intended output.......the following is my script plz help me out. 我只想获取'channel'标签的名称,即CHANNEL ...当我使用它来解析Google的rss时,脚本运行良好..............但是当我使用它时对于其他一些提供程序,它给出的输出是“ #text”,而不是给出的“ channel”是预期的输出。......以下是我的脚本plz帮助我。

$url = 'http://ibnlive.in.com/ibnrss/rss/sports/cricket.xml';
    $get =  perform_curl($url);
    $xml = new DOMDocument();
    $xml -> loadXML($get['remote_content']);  
  $fetch = $xml -> documentElement;
  $gettitle = $fetch -> firstChild -> nodeName; 
  echo $gettitle; 
  function perform_curl($rss_feed_provider_url){

       $url = $rss_feed_provider_url;
       $curl_handle = curl_init();

       // Do we have a cURL session?
       if ($curl_handle) {
          // Set the required CURL options that we need.
          // Set the URL option.
          curl_setopt($curl_handle, CURLOPT_URL, $url);
          // Set the HEADER option. We don't want the HTTP headers in the output.
          curl_setopt($curl_handle, CURLOPT_HEADER, false);
          // Set the FOLLOWLOCATION option. We will follow if location header is present.
          curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
          // Instead of using WRITEFUNCTION callbacks, we are going to receive the remote contents as a return value for the curl_exec function.
          curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);

          // Try to fetch the remote URL contents.
          // This function will block until the contents are received.
          $remote_contents = curl_exec($curl_handle);

          // Do the cleanup of CURL.
          curl_close($curl_handle);

          $remote_contents = utf8_encode($remote_contents);

          $handle = @simplexml_load_string($remote_contents);
          $return_result = array();
          if(is_object($handle)){
              $return_result['handle'] = true;
              $return_result['remote_content'] = $remote_contents;
              return $return_result;
          }
          else{
              $return_result['handle'] = false;
              $return_result['content_error'] = 'INVALID RSS SOURCE, PLEASE CHECK IF THE SOURCE IS A VALID XML DOCUMENT.';
              return $return_result;
          }

        } // End of if ($curl_handle)
      else{
        $return_result['curl_error'] = 'CURL INITIALIZATION FAILED.';
        return false;   
      }
   } 

it gives an output '#text' instead of giving 'channel' which is the intended output it happens because the $fetch -> firstChild -> nodeType is 3, which is a TEXT_NODE or just some text. it gives an output '#text' instead of giving 'channel' which is the intended output ,因为$fetch -> firstChild -> nodeType为3,它是TEXT_NODE或只是一些文本。 You could select channel by 您可以通过以下方式选择频道

echo $fetch->getElementsByTagName('channel')->item(0)->nodeName;

and

$gettitle = $fetch -> firstChild -> nodeValue;
var_dump($gettitle); 

gives you 给你

string(5) "
    "

or spaces and a new line symbol which happens to appear between the xml tags due to formatting. 或空格以及换行符号,由于格式而恰巧出现在xml标签之间。

ps: and RSS feed by your link fails validation at http://validator.w3.org/feed/ ps:您链接的RSS feed无法通过http://validator.w3.org/feed/进行验证

Take a look at the XML - it's been pretty printed with whitespace so it is being parsed correctly. 看一下XML-它已经用空格漂亮地打印出来了,因此可以正确解析。 The first child of the root node is a text node. 根节点的第一个子节点是文本节点。 I'd suggest using SimpleXML if you want an easier time of it, or use XPath queries on your DomDocument to obtain the tags of interest. 如果您希望更简单的时间,建议使用SimpleXML ,或者在DomDocument上使用XPath查询来获取感兴趣的标签。

Here's how you'd use SimpleXML 这是您使用SimpleXML的方法

$xml = new SimpleXMLElement($get['remote_content']);
print $xml->channel[0]->title;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM