简体   繁体   English

PHP - getElementsByTagName('rsslink') 结果不一致的 RSS 提要

[英]PHP - getElementsByTagName('rsslink') results inconsistent RSS feeds

I am trying to capture useful RSS feeds from different journals using getElementsByTagName('rsslink').我正在尝试使用 getElementsByTagName('rsslink') 从不同的期刊捕获有用的 RSS 提要。 The following code gets the rss feeds for LINK1 but not for LINK2.以下代码获取 LINK1 但不获取 LINK2 的 rss 提要。 Both links show similar xml pages when opened in a browser.在浏览器中打开时,两个链接都显示相似的 xml 页面。 I cannot figure out why the code does not capture the feeds for LINK2.我无法弄清楚为什么代码没有捕获 LINK2 的提要。

<?php
      //$journalLink = 'https://onlinelibrary.wiley.com/rss/journal/10.1002/(ISSN)1944-7973'; //LINK 1
      $journalLink = 'https://onlinelibrary.wiley.com/feed/2199160x/most-recent'; // LINK 2
      
      $rss = new DOMDocument();
      $rss->load($journalLink);
      $feed = array();
      foreach ($rss->getElementsByTagName('item') as $node) {
          $item = array (
              'title' => $node->getElementsByTagName('title')->item(0)->nodeValue,
              'desc' => $node->getElementsByTagName('description')->item(0)->nodeValue,
              'link' => $node->getElementsByTagName('link')->item(0)->nodeValue,
          );
          array_push($feed, $item);
      } //foreach 
      $noRSSItem  = count($feed);
      echo "noRSSItem: $noRSSItem  <br/>"; 
?> 

`The second url uses the Atom namespace for the item link element. `第二个 url 使用项目链接元素的 Atom 命名空间。 Atom allows for multiple links with different relations and types. Atom 允许具有不同关系和类型的多个链接。

The easiest way is to use Xpath expressions.最简单的方法是使用 Xpath 表达式。

$document = new DOMDocument();
$document->load($journalLink);
$xpath = new DOMXpath($document);
// register a prefix for the Atom namespace
$xpath->registerNamespace('a', 'http://www.w3.org/2005/Atom');

// xpath expression can include count and conditions
// check if the amount of item elements is zero
$noRSSItem = $xpath->evaluate('count(//item) = 0');

$feed = array();
foreach ($xpath->evaluate('//item') as $itemNode) {
    $feed[] = [
        // fetch the content of the first title child element
        'title' => $xpath->evaluate('string(title)', $itemNode),
        // same for description
        'desc' => $xpath->evaluate('string(description)', $itemNode),
        // fetch the 
        'link' => $xpath->evaluate(
            'string(link|a:link[@rel="self" and @type="application/atom+xml"]/@href)', 
            $itemNode
        ),
    ];
}

var_dump($feed);

Xpath Step By Step Xpath 一步一步

  • Atom 'link' child elements: Atom 'link' 子元素:
    a:link
  • with specific attributes:具有特定属性:
    a:link[@rel="self" and @type="application/atom+xml"]
  • the 'href' attribute: “href”属性:
    a:link[@rel="self" and @type="application/atom+xml"]/@href
  • alternativ fetch RSS (no namespace) link child elements: alternativ 获取 RSS(无命名空间) link子元素:
    link|a:link[@rel="self" and @type="application/atom+xml"]/@href
  • Cast to string:转换为字符串:
    string(link|a:link[@rel="self" and @type="application/atom+xml"]/@href)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM