簡體   English   中英

PHP無法使用cURL正確解析rss

[英]PHP Not parsing rss using cURL properly

我只想獲取'channel'標簽的名稱,即CHANNEL ...當我使用它來解析Google的rss時,腳本運行良好..............但是當我使用它時對於其他一些提供程序,它給出的輸出是“ #text”,而不是給出的“ channel”是預期的輸出。......以下是我的腳本plz幫助我。

$url = 'http://ibnlive.in.com/ibnrss/rss/sports/cricket.xml';
    $get =  perform_curl($url);
    $xml = new DOMDocument();
    $xml -> loadXML($get['remote_content']);  
  $fetch = $xml -> documentElement;
  $gettitle = $fetch -> firstChild -> nodeName; 
  echo $gettitle; 
  function perform_curl($rss_feed_provider_url){

       $url = $rss_feed_provider_url;
       $curl_handle = curl_init();

       // Do we have a cURL session?
       if ($curl_handle) {
          // Set the required CURL options that we need.
          // Set the URL option.
          curl_setopt($curl_handle, CURLOPT_URL, $url);
          // Set the HEADER option. We don't want the HTTP headers in the output.
          curl_setopt($curl_handle, CURLOPT_HEADER, false);
          // Set the FOLLOWLOCATION option. We will follow if location header is present.
          curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, true);
          // Instead of using WRITEFUNCTION callbacks, we are going to receive the remote contents as a return value for the curl_exec function.
          curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, true);

          // Try to fetch the remote URL contents.
          // This function will block until the contents are received.
          $remote_contents = curl_exec($curl_handle);

          // Do the cleanup of CURL.
          curl_close($curl_handle);

          $remote_contents = utf8_encode($remote_contents);

          $handle = @simplexml_load_string($remote_contents);
          $return_result = array();
          if(is_object($handle)){
              $return_result['handle'] = true;
              $return_result['remote_content'] = $remote_contents;
              return $return_result;
          }
          else{
              $return_result['handle'] = false;
              $return_result['content_error'] = 'INVALID RSS SOURCE, PLEASE CHECK IF THE SOURCE IS A VALID XML DOCUMENT.';
              return $return_result;
          }

        } // End of if ($curl_handle)
      else{
        $return_result['curl_error'] = 'CURL INITIALIZATION FAILED.';
        return false;   
      }
   } 

it gives an output '#text' instead of giving 'channel' which is the intended output ,因為$fetch -> firstChild -> nodeType為3,它是TEXT_NODE或只是一些文本。 您可以通過以下方式選擇頻道

echo $fetch->getElementsByTagName('channel')->item(0)->nodeName;

$gettitle = $fetch -> firstChild -> nodeValue;
var_dump($gettitle); 

給你

string(5) "
    "

或空格以及換行符號,由於格式而恰巧出現在xml標簽之間。

ps:您鏈接的RSS feed無法通過http://validator.w3.org/feed/進行驗證

看一下XML-它已經用空格漂亮地打印出來了,因此可以正確解析。 根節點的第一個子節點是文本節點。 如果您希望更簡單的時間,建議使用SimpleXML ,或者在DomDocument上使用XPath查詢來獲取感興趣的標簽。

這是您使用SimpleXML的方法

$xml = new SimpleXMLElement($get['remote_content']);
print $xml->channel[0]->title;

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM