简体   繁体   中英

multiple nodes with the same name in xml

I have this below xml file :-

 <item> 
  <title>Troggs singer Reg Presley dies at 71</title>  
  <description>Reg Presley, the lead singer of British rock band The Troggs, whose hits in the 1960s included Wild Thing, has died aged 71.</description>  
  <link>http://www.bbc.co.uk/news/uk-21332048#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
  <guid isPermaLink="false">http://www.bbc.co.uk/news/uk-21332048</guid>  
  <pubDate>Tue, 05 Feb 2013 01:13:07 GMT</pubDate>  
  <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701366_65701359.jpg"/>  
  <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701387_65701359.jpg"/> 
</item>  
<item> 
  <title>Horsemeat found at Newry cold store</title>  
  <description>Horse DNA has been found in frozen meat in a cold store in Northern Ireland, as Irish police investigate a third case of contamination.</description>  
  <link>http://www.bbc.co.uk/news/world-europe-21331208#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
  <guid isPermaLink="false">http://www.bbc.co.uk/news/world-europe-21331208</guid>  
  <pubDate>Mon, 04 Feb 2013 23:47:38 GMT</pubDate>  
  <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65700000/jpg/_65700000_002950295-1.jpg"/>  
  <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65700000/jpg/_65700001_002950295-1.jpg"/> 
</item>  
<item> 
  <title>US 'will sue' Standard &amp; Poor's</title>  
  <description>Standard &amp; Poor's says it is to be sued by the US government over the credit ratings agency's assessment of mortgage bonds before the financial crisis.</description>  
  <link>http://www.bbc.co.uk/news/21331018#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
  <guid isPermaLink="false">http://www.bbc.co.uk/news/21331018</guid>  
  <pubDate>Mon, 04 Feb 2013 22:45:52 GMT</pubDate>  
  <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701717_mediaitem65699884.jpg"/>  
  <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701718_mediaitem65699884.jpg"/> 
   </item>  

Now when i give the input node as "item" to retrieve data , than instead of displaying all the item nodes it just displays the last item node.....

My code is :-

    $dom->load($url);
    $link = $dom->getElementsByTagName($tag_name);
    $value = array();

    for ($i = 0; $i < $link->length; $i++) {
        $childnode['name'] = $link->item($i)->nodeName;
        $childnode['value'] = $link->item($i)->nodeValue;
        $value[$childnode['name']] = $childnode['value'];
    }

here ,$url is the url of my xml page $tag_name is the name of the node , in this case it is "item"

The output what i get is :-

  US 'will sue' Standard &amp; Poor's.Standard &amp; Poor's says it is to be sued by the US government over the credit ratings agency's assessment of mortgage bonds before the financial crisis.http://www.bbc.co.uk/news/21331018#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa.http://www.bbc.co.uk/news/world-europe-21331208.Mon, 04 Feb 2013 22:45:52 GMT

This is the data of the last tags. I want the data of all the item tags and also i want the data to be in this format:-

title :-  US 'will sue' Standard &amp; Poor's
description :- Standard &amp; Poor's says it is to be sued by the US government over 
the credit ratings agency's assessment of mortgage bonds before the financial crisis

I want even the names of childnodes (if any) in my output... Please help me out....

(Don't forget the root node.) It looks like one of the methods is just concatenating all of the text nodes under that element together (just about equivalent to a xsl:value-of select=.). I've never done much with the DOMDocument class and related classes in PHP. But what you can do is canonicalize the DOMNode using the C14N() method, and then parse the resulting string. It isn't pretty, but it gets the result you want and is easily extensible:

    $tag_name = 'item';
    $link = $dom->getElementsByTagName($tag_name);
    for ($i = 0; $i < $link->length; $i++) {
        $treeAsString = $link->item($i)->C14N();
        $curBranchParts = explode("\n",$treeAsString);
        $curBranchPartsSize = count($curBranchParts);
        $curBranchParts = explode("\n",$treeAsString);
        $curBranchPartsSize = count($curBranchParts);
        for ($j = 1; $j < ($curBranchPartsSize - 1); $j++) { 
            $curItem = $curBranchParts[$j];
            $curItemParts = explode('<', $curItem);
            $tagWithContent = $curItemParts[1];
            $tagWithContentParts = explode('>',$tagWithContent);
            $tag = $tagWithContentParts[0];
            $content = $tagWithContentParts[1];

            if (trim($content) != '') echo $tag . ' :- ' . $content . '<br />';
            else echo $tag . '<br />';   
        }
    }

You seem to be looping over the 'item' nodes only and, as other people mentioned, are overwriting the previous value on each iteration.

If your debug the $value array using print_r($value) inside the loop;

$dom->load($url);
$link = $dom->getElementsByTagName($tag_name);
$value = array();

for ($i = 0; $i < $link->length; $i++) {
    $childnode['name'] = $link->item($i)->nodeName;
    $childnode['value'] = $link->item($i)->nodeValue;
    $value[$childnode['name']] = $childnode['value'];

    echo 'iteration: ' . $i . '<br />';
    echo '<pre>'; print_r($value); echo '</pre>';
}

You'll probably see something like this

// iteration: 0
Array
(
    [item] => Troggs singer Reg Presley dies at 71 ......
)

// iteration: 1
Array
(
    [item] => Horsemeat found at Newry cold store .........
)

// iteration: 2
Array
(
    [item] => US 'will sue' Standard & Poor's .........
)

What you should be doing is this:

$dom = new DOMDocument();
$dom->preserveWhiteSpace = false;
$dom->load($url);
$items = $dom->getElementsByTagName($tag_name);
$values = array();

foreach ($items as $item) {
    $itemProperties = array();

    // Loop through the 'sub' items 
    foreach ($item->childNodes as $child) {
        // Note: using 'localName' to remove the namespace
        if (isset($itemProperties[(string) $child->localName])) {
            // Quickfix to support multiple 'thumbnails' per item (although they have no content)
            $itemProperties[$child->localName] = (array) $itemProperties[$child->localName];
            $itemProperties[$child->localName][] = $child->nodeValue;
        } else {
            $itemProperties[$child->localName] = $child->nodeValue;
        }
    }

    // Append the item to the 'values' array
    $values[] = $itemProperties;

}


// Output the result
echo '<pre>'; print_r($values); echo '</pre>';

Which outputs:

Array
(
    [0] => Array
        (
            [title] => Troggs singer Reg Presley dies at 71
            [description] => Reg Presley, the lead singer of British rock band The Troggs, whose hits in the 1960s included Wild Thing, has died aged 71.
            [link] => http://www.bbc.co.uk/news/uk-21332048#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa
            [guid] => http://www.bbc.co.uk/news/uk-21332048
            [pubDate] => Tue, 05 Feb 2013 01:13:07 GMT
            [thumbnail] => Array
                (
                    [0] => 
                    [1] => 
                )

        )

    [1] => Array
        (
            [title] => Horsemeat found at Newry cold store
            [description] => Horse DNA has been found in frozen meat in a cold store in Northern Ireland, as Irish police investigate a third case of contamination.
            [link] => http://www.bbc.co.uk/news/world-europe-21331208#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa
            [guid] => http://www.bbc.co.uk/news/world-europe-21331208
            [pubDate] => Mon, 04 Feb 2013 23:47:38 GMT
            [thumbnail] => Array
                (
                    [0] => 
                    [1] => 
                )

        )

    [2] => Array
        (
            [title] => US 'will sue' Standard & Poor's
            [description] => Standard & Poor's says it is to be sued by the US government over the credit ratings agency's assessment of mortgage bonds before the financial crisis.
            [link] => http://www.bbc.co.uk/news/21331018#sa-ns_mchannel=rss&ns_source=PublicRSS20-sa
            [guid] => http://www.bbc.co.uk/news/21331018
            [pubDate] => Mon, 04 Feb 2013 22:45:52 GMT
            [thumbnail] => Array
                (
                    [0] => 
                    [1] => 
                )

        )

)

Your problem is that your source XML needs to have a root node (it can be called whatever you want). To be valid XML, you always need a root node. That is, every valid XML file will have exactly one element that has no parent or sibling. Once you have the root node, then your XML will load into your object.

For example:

<root>
    <item> 
      <title>Troggs singer Reg Presley dies at 71</title>  
      <description>Reg Presley, the lead singer of British rock band The Troggs, whose hits in the 1960s included Wild Thing, has died aged 71.</description>  
      <link>http://www.bbc.co.uk/news/uk-21332048#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
      <guid isPermaLink="false">http://www.bbc.co.uk/news/uk-21332048</guid>  
      <pubDate>Tue, 05 Feb 2013 01:13:07 GMT</pubDate>  
      <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701366_65701359.jpg"/>  
      <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701387_65701359.jpg"/> 
    </item>  
    <item> 
      <title>Horsemeat found at Newry cold store</title>  
      <description>Horse DNA has been found in frozen meat in a cold store in Northern Ireland, as Irish police investigate a third case of contamination.</description>  
      <link>http://www.bbc.co.uk/news/world-europe-21331208#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
      <guid isPermaLink="false">http://www.bbc.co.uk/news/world-europe-21331208</guid>  
      <pubDate>Mon, 04 Feb 2013 23:47:38 GMT</pubDate>  
      <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65700000/jpg/_65700000_002950295-1.jpg"/>  
      <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65700000/jpg/_65700001_002950295-1.jpg"/> 
    </item>  
    <item> 
      <title>US 'will sue' Standard &amp; Poor's</title>  
      <description>Standard &amp; Poor's says it is to be sued by the US government over the credit ratings agency's assessment of mortgage bonds before the financial crisis.</description>  
      <link>http://www.bbc.co.uk/news/21331018#sa-ns_mchannel=rss&amp;ns_source=PublicRSS20-sa</link>  
      <guid isPermaLink="false">http://www.bbc.co.uk/news/21331018</guid>  
      <pubDate>Mon, 04 Feb 2013 22:45:52 GMT</pubDate>  
      <media:thumbnail width="66" height="49" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701717_mediaitem65699884.jpg"/>  
      <media:thumbnail width="144" height="81" url="http://news.bbcimg.co.uk/media/images/65701000/jpg/_65701718_mediaitem65699884.jpg"/> 
    </item>
</root>

I think the code has problems:

    for ($i = 0; $i < $link->length; $i++) {
        $childnode['name'] = $link->item($i)->nodeName;
        $childnode['value'] = $link->item($i)->nodeValue;
        $value[$childnode['name']] = $childnode['value'];
    } 

Each time $childnode['name'] assigned by new value by for loop and in the last when $i equals to the length of $link.length then this value will assigned to $childnode array . So to reduce the problem it should be a multidimensional array like

for ($i = 0; $i < $link->length; $i++) {
    $childnode['name'][$i] = $link->item($i)->nodeName;
    $childnode['value'][$i] = $link->item($i)->nodeValue;
    $value[$childnode['name'][$i]][$i] = $childnode['value'];
}

To test it: print_r($childnode);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM