简体   繁体   中英

How to get XML namespace attributes with PHP simplexml

I'm pretty new to this, and I've followed several tutorials (including other OS questions ), but I can't seem to get this to work.

I'm working with a library's EAD file (Library of Congress XML standard for describing library collections, http://www.loc.gov/ead/index.html ), and I'm having trouble with the namespaces.

A simplified example of the XML:

<?xml version="1.0"?>
<ead xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd"  xmlns:ns2="http://www.w3.org/1999/xlink" xmlns="urn:isbn:1-931666-22-9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<c02 id="ref24" level="item">
                <did>
                    <unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
                    <unitid>1</unitid>
                    <container id="cid71717" type="Box" label="Mixed materials">1</container>
                    <physdesc>
                        <extent>Pencil</extent>
                    </physdesc>
                    <unitdate>[1817]</unitdate>
                </did>
                <dao id="ref001" ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:92"/>
            </c02>
            <c02 id="ref25" level="item">
                <did>
                    <unittitle>Argus carryntas (Butterfly)</unittitle>
                    <unitid>2</unitid>
                    <container id="cid71715" type="Box" label="Mixed materials">1</container>
                    <physdesc>
                        <extent>Watercolor</extent>
                    </physdesc>
                    <unitdate>[1817]</unitdate>
                </did>
                <dao ns2:actuate="onRequest" ns2:show="embed" ns2:role="" ns2:href="http://diglib.amphilsoc.org/fedora/repository/graphics:87"/>
            </c02>

Following advise I found elsewhere, I was trying this (and variations on this theme):

<?php 
$entries = simplexml_load_file('test.xml');        
    foreach ($entries->c02->children('http://www.w3.org/1999/xlink') as $entry) {

      echo 'link: ', $entry->children('dao', true)->href, "\n";

  }
 ?> 

Which, of course, isn't working.

You have to understand the difference between a namespace and a namespace prefix. The namespace is the value inside the xmlns attributes. The xmlns attributes define the prefix, which is an alias for the actual namespace for that node and its descendants.

In you example are three namespaces:

So elements and attributes starting with "ns2:" are inside the xlink namespace, elements and attributes starting with "xsi:" in the XML schema instance namespace. All elements without an namespace prefix are in the isbn specific namespace. Attributes without a namespace prefix are always in NO namespace.

If you query the xml dom, you need to define your own namespaces prefixes. The namespace prefixes in the xml documents can change, especially if they are external resources.

I don't use "SimpleXML", so here is an DOM example:

<?php

$xml = <<<'XML'
<?xml version="1.0"?>
<ead 
  xsi:schemaLocation="urn:isbn:1-931666-22-9 http://www.loc.gov/ead/ead.xsd"
  xmlns:ns2="http://www.w3.org/1999/xlink" 
  xmlns="urn:isbn:1-931666-22-9" 
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <c02 id="ref24" level="item">
    <did>
       <unittitle>"Lepidoptera and seas(on) of appearance"</unittitle>
    </did>
  </c02>
</ead>
XML;

// create dom and load the xml
$dom = new DOMDocument();
$dom->loadXml($xml);
// create an xpath object
$xpath = new DOMXpath($dom);
// register you own namespace prefix
$xpath->registerNamespace('isbn', 'urn:isbn:1-931666-22-9');

foreach ($xpath->evaluate('//isbn:unittitle', NULL, FALSE) as $node) {
  var_dump($node->textContent);
}

Output:

string(40) ""Lepidoptera and seas(on) of appearance""

Xpath is quite powerful and the most comfortable way to extract data from XML.

The default namespace in you case is weird. It looks like it is dynamic, so you might need a way to read it. Here is the Xpath for that:

$defaultNamespace = $xpath->evaluate('string(/*/namespace::*[name() = ""])');

It reads the namespace without a prefix from the document element.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM