简体   繁体   English

如何访问从外部XML接收到的'Dublin Core'名称空间中的信息?

[英]How can I access the infomation in the 'Dublin Core' namespace received from external XML?


For the last day I struggled with some XML parsing in PHP. 在最后一天,我在PHP中进行了一些XML解析。 I use an external service to provide me with information about books based on an ISBN as search term via XML (A service provided by the German National Library which requires to include a private token in the request (This is not the cause of the problem, I've already checked that) -> https://www.dnb.de/EN/Professionell/Metadatendienste/Datenbezug/SRU/sru_node.html | And I have also checked that 'allow_url_fopen' is enabled in the php.ini) . 我使用外部服务通过XML向我提供有关基于ISBN的图书的搜索信息(德国国家图书馆提供的一项服务,要求在请求中包含私有令牌(这不是问题的根源,我已经检查过)-> https://www.dnb.de/ZH/Professionell/Metadatendienste/Datenbezug/SRU/sru_node.html |我还检查了php.ini中是否启用了“ allow_url_fopen”)

Now, my problem is that whatever method for XML parsing I use the necessary book information is not displayed and accesible for me to work with in the Simple XML Element Object (see the result of the second 'echo' from my code below in this screenshot ). 现在,我的问题是,我使用必需的书籍信息进行XML解析的任何方法都不会显示,并且无法在Simple XML Element Object中使用(请参见此屏幕快照中代码的第二个“ echo”结果) )。 If I first pull the XML as a string, the information is visible and accesible (see the result of the first 'echo' from my code below in this screenshot ). 如果我首先将XML作为字符串提取,则该信息是可见且可访问的(请参见此屏幕快照中下面代码的第一个“ echo”结果)。 The goal would be to be able to access the information about the books based on their element names (dc:title, dc:creator, dc:publisher, dc:date, etc.) individually. 目标是能够分别根据其元素名称(dc:title,dc:creator,dc:publisher,dc:date等)访问有关书籍的信息。 In my current piece of code this is not possible as PHP will tell me: "Warning: main(): Node no longer exists" when running through the 'foreach' loop. 在我当前的代码段中,这是不可能的,因为PHP会在执行“ foreach”循环时告诉我:“警告:main():节点不再存在”。

I have already looked at several Stack Overflow posts about problems with namespaces in Simple XML Element Objects but I wasn't able to adapt the solutions proposed there for the problem I face here. 我已经看过几篇有关“简单XML元素对象”中命名空间问题的Stack Overflow帖子,但是我无法针对在那里遇到的问题调整那里提出的解决方案。
I hope that somebody can help me with this and point me to a solution, so I can access the information in the 'dc' namespace of the XML. 我希望有人可以帮助我,并为我提供解决方案,以便我可以访问XML的“ dc”命名空间中的信息。

This is the very short and simple PHP-Code I have used so far: 到目前为止,这是我使用过的非常简短的PHP代码:

$request = file_get_contents("http://externalXML.com"); //URL was replaced
echo "<pre>"; print_r($request); echo "</pre>"; 
$xml = simplexml_load_string($request);
echo "<pre>"; print_r($xml); echo "</pre>"; 
foreach ($xml->records->record->recordData->dc->children() as $child) {
    echo "Inhalt: " . $child . "<br>";
}

And this is the content of the XML (as I'm always looking for an unique ISBN (see 'query'-element) there can only be no or one result, but never more): 这就是XML的内容(因为我一直在寻找唯一的ISBN(请参阅“查询”元素),因此只能有一个或一个结果,但是永远不会有更多结果):

<searchRetrieveResponse>
<version>1.1</version>
<numberOfRecords>1</numberOfRecords>
<records>
    <record>
    <recordSchema>oai_dc</recordSchema>
    <recordPacking>xml</recordPacking>
    <recordData>
        <dc>
            <dc:title>1968 : Worauf wir stolz sein dürfen / Gretchen Dutschke</dc:title>
            <dc:creator>Dutschke, Gretchen [Verfasser]</dc:creator>
            <dc:publisher>Hamburg : Sven Murmann Verlagsgesellschaft mbH</dc:publisher>
            <dc:date>2018</dc:date>
            <dc:language>ger</dc:language>
            <dc:identifier xsi:type="tel:URN">urn:nbn:de:101:1-201803147211</dc:identifier>
            <dc:identifier xsi:type="tel:URL">http://nbn-resolving.de/urn:nbn:de:101:1-201803147211</dc:identifier>
            <dc:identifier xsi:type="tel:ISBN">978-3-96196-007-1</dc:identifier>
            <dc:identifier xsi:type="tel:URL">http://d-nb.info/1154519600/34</dc:identifier>
            <dc:identifier xsi:type="tel:URL">https://www.kursbuch.online</dc:identifier>
            <dc:identifier xsi:type="dnb:IDN">1154519600</dc:identifier>
            <dc:subject>300 Sozialwissenschaften, Soziologie, Anthropologie</dc:subject>
            <dc:type>Online-Ressource</dc:type>
            <dc:relation>http://d-nb.info/1144647959</dc:relation>
        </dc>
    </recordData>
    <recordPosition>1</recordPosition>
    </record>
</records>
<nextRecordPosition>2</nextRecordPosition>
<echoedSearchRetrieveRequest>
<version>1.1</version>
<query>"9783961960071"</query>
<xQuery xsi:nil="true"/>
</echoedSearchRetrieveRequest>
</searchRetrieveResponse>

Cheers, Timo 干杯,蒂莫

Note: If the missing declarations are just a mistake in the question, this should be marked as a duplicate of Reference - how do I handle namespaces (tags and attributes with colon in) in SimpleXML? 注意:如果缺少的声明只是问题中的错误,则应将其标记为Reference的重复-如何在SimpleXML中处理名称空间(带有冒号的标记和属性)?

If the XML is actually as shown in the question, it is invalid - there are no declarations for the namespace prefixes dc and xsi . 如果XML实际上如问题中所示,则无效-没有名称空间前缀dcxsi声明。 If you check your PHP logs, or turn on display_errors , you will see dozens of warnings every time the XML is parsed. 如果检查PHP日志或打开display_errors ,则每次解析XML时都会看到数十条警告。

To work around this broken XML, you could wrap the response in an extra root element that defines the namespaces, resulting in valid XML. 要解决此损坏的XML,可以将响应包装在定义根目录空间的额外根元素中,从而得到有效的XML。

// Define your namespace URIs somewhere, for reference
// Since you're faking them, they could be anything you like, but in case the XML
//  is fixed in future, you might as well use the values that were probably intended
define('XMLNS_DUBLIN_CORE', 'http://purl.org/dc/elements/1.1/');
define('XMLNS_XSD_INSTANCE', 'http://www.w3.org/2001/XMLSchema-instance');

// Add a wrapper with the missing namespace declarations around the whole document
$request = '<dummy xmlns:dc="' . XMLNS_DUBLIN_CORE . '" xmlns:xsi="' . XMLNS_XSD_INSTANCE . '">'
    . $request
    . "</dummy>";

// Parse the now-valid XML
$xml = simplexml_load_string($request);

// Pop the wrapper off to get the original root element
$xml = $xml->children()[0];

// Proceed as though the document had been defined properly
echo $xml->records->record->recordData->dc->children(XMLNS_DUBLIN_CORE)->title;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM