繁体   English   中英

PHP无法使用cURL正确解析rss

[英]PHP Not parsing rss using cURL properly

我只想获取'channel'标签的名称,即CHANNEL ...当我使用它来解析Google的rss时,脚本运行良好..............但是当我使用它时对于其他一些提供程序,它给出的输出是“ #text”,而不是给出的“ channel”是预期的输出。......以下是我的脚本plz帮助我。

$url = 'http://ibnlive.in.com/ibnrss/rss/sports/cricket.xml';
    $get =  perform_curl($url);
    $xml = new DOMDocument();
    $xml -> loadXML($get['remote_content']);  
  $fetch = $xml -> documentElement;
  $gettitle = $fetch -> firstChild -> nodeName; 
  echo $gettitle; 
  function perform_curl($rss_feed_provider_url){

       $url = $rss_feed_provider_url;
       $curl_handle = curl_init();

       // Do we have a cURL session?
       if ($curl_handle) {
          // Set the required CURL options that we need.
          // Set the URL option.
          curl_setopt($curl_handle, CURLOPT_URL, $url);
          // Set the HEADER option. We don't want the HTTP headers in the output.
          curl_setopt($curl_handle, CURLOPT_HEADER, false);
          // Set the FOLLOWLOCATION option. We will follow if location header is present.
          curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
          // Instead of using WRITEFUNCTION callbacks, we are going to receive the remote contents as a return value for the curl_exec function.
          curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);

          // Try to fetch the remote URL contents.
          // This function will block until the contents are received.
          $remote_contents = curl_exec($curl_handle);

          // Do the cleanup of CURL.
          curl_close($curl_handle);

          $remote_contents = utf8_encode($remote_contents);

          $handle = @simplexml_load_string($remote_contents);
          $return_result = array();
          if(is_object($handle)){
              $return_result['handle'] = true;
              $return_result['remote_content'] = $remote_contents;
              return $return_result;
          }
          else{
              $return_result['handle'] = false;
              $return_result['content_error'] = 'INVALID RSS SOURCE, PLEASE CHECK IF THE SOURCE IS A VALID XML DOCUMENT.';
              return $return_result;
          }

        } // End of if ($curl_handle)
      else{
        $return_result['curl_error'] = 'CURL INITIALIZATION FAILED.';
        return false;   
      }
   } 

it gives an output '#text' instead of giving 'channel' which is the intended output ,因为$fetch -> firstChild -> nodeType为3,它是TEXT_NODE或只是一些文本。 您可以通过以下方式选择频道

echo $fetch->getElementsByTagName('channel')->item(0)->nodeName;

$gettitle = $fetch -> firstChild -> nodeValue;
var_dump($gettitle); 

给你

string(5) "
    "

或空格以及换行符号,由于格式而恰巧出现在xml标签之间。

ps:您链接的RSS feed无法通过http://validator.w3.org/feed/进行验证

看一下XML-它已经用空格漂亮地打印出来了,因此可以正确解析。 根节点的第一个子节点是文本节点。 如果您希望更简单的时间,建议使用SimpleXML ,或者在DomDocument上使用XPath查询来获取感兴趣的标签。

这是您使用SimpleXML的方法

$xml = new SimpleXMLElement($get['remote_content']);
print $xml->channel[0]->title;

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM