简体   繁体   English

如何使用 PHP 解析带有冒号标记的 XML 节点

[英]How to parse an XML node with a colon tag using PHP

I am trying to fetch the value of the following nodes from [this URL (takes quite some time to load)][1].我正在尝试从 [此 URL(加载需要相当长的时间)][1] 中获取以下节点的值。 The elements I'm interested in are:我感兴趣的元素是:

title, g:price and g:gtin

The XML starts like this: XML 开始是这样的:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
  <channel>
    <title>PhotoSpecialist.de</title>
    <link>http://www.photospecialist.de</link>
    <description/>
    <item>
      <g:id>BEN107C</g:id>
      <title>Benbo Trekker Mk3 + Kugelkopf + Tasche</title>
      <description>
        Benbo Trekker Mk3 + Kugelkopf + Tasche Das Benbo Trekker Mk3 ist eine leichte Variante des beliebten Benbo 1. Sein geringes Gewicht macht das Trekker Mk3 zum idealen Stativ, wenn Sie viel draußen fotografieren und viel unterwegs sind. Sollten Sie in eine Situation kommen, in der maximale Stabilität zählt, verfügt das Benbo Trekker Mk3 über einen Haken an der Mittelsäule. An diesem können Sie das Stativ mit zusätzlichem Gewicht bei Bedarf beschweren. Dank der zwei besonderen Kamera-Befestigungsschrauben können Sie mit dem Benbo Trekker Mk3 sehr nah am Boden fotografieren. So nah, dass in vielen Fällen die einzige Einschränkung die Größe Ihrer Kamera darstellt. In diesem Set erhalten Sie das Benbo Trekker Mk3 zusammen mit einem Kugelkopf, Socket und einer Tasche für den sicheren und komfortablen Transport.
      </description>
      <link>
        http://www.photospecialist.de/benbo-trekker-mk3-kugelkopf-tasche?dfw_tracker=2469-16
      </link>
      <g:image_link>http://static.fotokonijnenberg.nl/media/catalog/product/b/e/benbo_trekker_mk3_tripod_kit_with_b__s_head__bag_ben107c1.jpg</g:image_link>
      <g:price>199.00 EUR</g:price>
      <g:condition>new</g:condition>
      <g:availability>in stock</g:availability>
      <g:identifier_exists>TRUE</g:identifier_exists>
      <g:brand>Benbo</g:brand>
      <g:gtin>5022361100576</g:gtin>
      <g:item_group_id>0</g:item_group_id>
      <g:product_type>Tripod</g:product_type>
      <g:mpn/>
      <g:google_product_category>Kameras & Optik</g:google_product_category>
    </item>
  ...
  </channel>
</rss>

To get this, I have written the following code:为此,我编写了以下代码:

$z = new XMLReader;
$z->open('https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml');

$doc = new DOMDocument;

while ($z->read() && $z->name !== 'item')
    ;

while ($z->name === 'item')
{
    $node = new SimpleXMLElement($z->readOuterXML());
    $a = $node->title;
    $b = $node->price;
    $c = $node->gtin;
    echo $a . $b . $c . "<br />";
    $z->next('item');
}

This returns me only the title...price and gtin are not showing.这仅返回标题...price 和 gtin 未显示。

The elements you're asking about are not part of the default namespace but in a different one.您询问的元素不是默认命名空间的一部分,而是位于不同的命名空间中。 You can see that because they have a prefix in their name separated by the colon:您可以看到,因为它们的名称中有一个前缀,以冒号分隔:

  ...
  <channel>
    <title>PhotoSpecialist.de</title>
    <!-- title is in the default namespace, no colon in the name -->
    ...
    <g:price>199.00 EUR</g:price>
    ...
    <g:gtin>5022361100576</g:gtin>
    <!-- price and gtin are in a different namespace, colon in the name and prefixed by "g" -->
  ...

The namespace is given with a prefix, here "g" in your case.命名空间带有前缀,在您的情况下为“g”。 And the prefix the namespace stands for is defined in the document element here:命名空间所代表的前缀在此处的文档元素中定义:

<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">

So the namespace is " http://base.google.com/ns/1.0 ".所以命名空间是“ http://base.google.com/ns/1.0 ”。

When you access the child-elements by their name with the SimpleXMLElement as you currently do:当您像当前一样使用SimpleXMLElement通过名称访问子元素时:

$a = $node->title;
$b = $node->price;
$c = $node->gtin;

you're looking only in the default namespace.您只查看默认命名空间。 So only the first element actually contains text, the other two are created on-thy-fly and are yet empty.因此,只有第一个元素实际上包含文本,另外两个是建立在,你飞和尚未清空。

To access the namespaced child-elements you need to tell the SimpleXMLElement explicitly with the children() method.要访问命名空间子元素,您需要使用children()方法显式告诉SimpleXMLElement It creates a new SimpleXMLElement with all the children in that namespace instead of the default one:它创建了一个新的SimpleXMLElement,其中包含该命名空间中的所有子元素,而不是默认的:

$google = $node->children("http://base.google.com/ns/1.0");

$a = $node->title;
$b = $google->price;
$c = $google->gtin;

So much for the isolated example (yes, that's it already).孤立的例子就这么多(是的,就是这样)。

A full example then could look like (including node-expansion on the reader, the code you had was a bit rusty):一个完整的例子可能看起来像(包括阅读器上的节点扩展,你的代码有点生疏):

<?php
/**
 * How to parse an XML node with a colon tag using PHP
 *
 * @link http://stackoverflow.com/q/29876898/367456
 */
const HTTP_BASE_GOOGLE_COM_NS_1_0 = "http://base.google.com/ns/1.0";

$url = 'https://my.datafeedwatch.com/static/files/1248/8222ebd3847fbfdc119abc9ba9d562b2cdb95818.xml';

$reader = new XMLReader;
$reader->open($url);

$doc = new DOMDocument;

// move to first item element
while (($valid = $reader->read()) && $reader->name !== 'item') ;

while ($valid) {
    $default    = simplexml_import_dom($reader->expand($doc));
    $googleBase = $default->children(HTTP_BASE_GOOGLE_COM_NS_1_0);
    printf(
        "%s - %s - %s<br />\n"
        , htmlspecialchars($default->title)
        , htmlspecialchars($googleBase->price)
        , htmlspecialchars($googleBase->gtin)
    );

    // move to next item element
    $valid = $reader->next('item');
};

I hope this both gives an explanation and broadens the view a little on XMLReader use as well.我希望这既能给出解释,又能稍微拓宽对XMLReader使用的看法。

If the main tag is a string with colon , you must use如果主标记是带有冒号字符串,则必须使用

$xml->next($xml->localName);

to move to the next item element.移动到下一个项目元素。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM