简体   繁体   中英

Trouble parsing the XML file with XPath and SimpleXML

I am having trouble parsing this XML file using SimpleXML and XPATH.

<feed xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns="http://www.w3.org/2005/Atom" xmlns:pamedia="http://paimages.co.uk/pamedia.htm">
<title>
    Image / video search results
</title>
<subtitle>
    Images / video found containing the search string provided
</subtitle>
<pamedia:found>
    47
</pamedia:found>
<pamedia:perpage>
    100
</pamedia:perpage>
<pamedia:page>
    1
</pamedia:page>
<opensearch:totalResults>
    47
</opensearch:totalResults>
<opensearch:itemsPerpage>
    100
</opensearch:itemsPerpage>
<opensearch:startIndex>
    1
</opensearch:startIndex>
<id>
    http://images.pressassociation.com/cgi/search_api/?state=search&q=test+cricket+-pakistan+allincaption:+fast+ball
</id>
<link rel="self" href="http://images.pressassociation.com/cgi/search_api/?state=search&q=test+cricket+-pakistan+allincaption:+fast+ball"></link>
<updated>
    2013-11-19T09:46:42Z
</updated>
<link rel="self" href="http://images.pressassociation.com/cgi/search_api/?state=search&q=test+cricket+-pakistan+allincaption:+fast+ball">
    <updated>
        2013-11-19T09:46:42Z
    </updated>
    <name>
        Press Association Images
    </name>
    <email>
        REDACTED
    </email>
</link>
<entry>
    <pamedia:media-type>
        image/jpeg
    </pamedia:media-type>
    <pamedia:event_date>
        2011-06-21
    </pamedia:event_date>
    <pamedia:urn>
        11019393
    </pamedia:urn>
    <pamedia:domain>
        2
    </pamedia:domain>
    <pamedia:domain_prefix>
        PA
    </pamedia:domain_prefix>
    <link type="application/vnd.iptc.g2.newsitem+xml" href="http://images.pressassociation.com/meta/2.11019393.xml"></link>
    <link rel="related" href="http://images.pressassociation.com/meta/2.11019393.html" type="text/html"></link>
    <link rel="related" href="http://images.pressassociation.com/empicsthumbnail/vol111/block2204/11019393.jpg" type="image/jpeg"></link>
    <media:thumbnail width="153" medium="image" height="127" url="http://images.pressassociation.com/empicsthumbnail/vol111/block2204/11019393.jpg" type="image/jpeg"></media:thumbnail>
    <media:content expression="sample" medium="image" width="616" height="511" url="http://images.pressassociation.com/image/preview/2.11019393.jpg" type="image/jpeg"></media:content>
    <media:copyright>
        Associated Press
    </media:copyright>
    <media:content expression="full" medium="photo" width="1657" height="2000" url="http://images.pressassociation.com/image/2.11019393.jpg" type="image/jpeg"></media:content>
    <updated>
        2011-06-21T22:48:19Z
    </updated>
    <summary type="html">
        West Indies' fast bowler Fidel Edwards, left, reacts after his wicket keeper Carlton Baugh, unseen, couldn't hold his delivery as India's batsman Virat Kohli, right, watches the ball reach the boundary in the second innings on the second day of their first cricket Test match in Kingston, Jamaica, Tuesday June 21, 2011. (AP Photo/Andres Leighton)
    </summary>
    <rights type="html">
        UK picture buyers only JAM163
    </rights>
    <id>
        http://images.pressassociation.com/meta/2.11019393.xml
    </id>
    <title type="html">
        Jamaica India West Indies Cricket
    </title>
    <category term="S"></category>
    <author>
        <name>
            Andres Leighton/AP
        </name>
    </author>
</entry>
<entry>
    <pamedia:media-type>
        image/jpeg
    </pamedia:media-type>
    <pamedia:event_date>
        2011-06-21
    </pamedia:event_date>
    <pamedia:urn>
        11019370
    </pamedia:urn>
    <pamedia:domain>
        2
    </pamedia:domain>
    <pamedia:domain_prefix>
        PA
    </pamedia:domain_prefix>
    <link type="application/vnd.iptc.g2.newsitem+xml" href="http://images.pressassociation.com/meta/2.11019370.xml"></link>
    <link rel="related" href="http://images.pressassociation.com/meta/2.11019370.html" type="text/html"></link>
    <link rel="related" href="http://images.pressassociation.com/empicsthumbnail/vol111/block2204/11019370.jpg" type="image/jpeg"></link>
    <media:thumbnail width="161" medium="image" height="127" url="http://images.pressassociation.com/empicsthumbnail/vol111/block2204/11019370.jpg" type="image/jpeg"></media:thumbnail>
    <media:content expression="sample" medium="image" width="650" height="511" url="http://images.pressassociation.com/image/preview/2.11019370.jpg" type="image/jpeg"></media:content>
    <media:copyright>
        Associated Press
    </media:copyright>
    <media:content expression="full" medium="photo" width="1571" height="2000" url="http://images.pressassociation.com/image/2.11019370.jpg" type="image/jpeg"></media:content>
    <updated>
        2011-06-21T22:35:22Z
    </updated>
    <summary type="html">
        India's batsman Virat Kohli ducks to avoid being hit by a short ball off West Indies' fast bowler Fidel Edwards in the second innings on the second day of their first cricket Test match in Kingston, Jamaica, Tuesday June 21, 2011. (AP Photo/Andres Leighton)
    </summary>
    <rights type="html">
        UK picture buyers only JAM160
    </rights>
    <id>
        http://images.pressassociation.com/meta/2.11019370.xml
    </id>
    <title type="html">
        Jamaica India West Indies Cricket
    </title>
    <category term="S"></category>
    <author>
        <name>
            Andres Leighton/AP
        </name>
    </author>
</entry>

I am trying to select the tags and pull back the information contained within them, specifically the link[3] blocks so I can embed the link to the thumbnail image in I have connected to the API and fetched the data back but my XPath queries I have tried must be wrong.

I have tried $query = /entry/link[3]/href and some others to no avail. I am pretty new to using XPath to query XML data. Any help would be greatly appreciated.

There are actually multiple things wrong:

First, as Michael Kay already pointed out, the XML nodes in question have a namespace, and the query looks for nodes without a namespace. You'll have to register the namespace for XPath queries using a prefix (you can choose that arbitrarily, I'll use 'namespace' in my example), then use that to select nodes.

Second, the query was missing the '/feed' first segment.

Third, selecting the href attribute requires .../@href, since .../href would select a child node.

Complete code (in PHP syntax, don't know if SimpleXML is available in other languages):

$xml = new SimpleXMLElement('... xml source text ...');
$xml->registerXPathNamespace('namespace', 'http://www.w3.org/2005/Atom');
$query = '/namespace:feed/namespace:entry/namespace:link[3]/@href';
debug($xml->xpath($query));
die();

Note that for some reason this returns an element object containing only the href attribute, whereas one would expect it to return the attribute node. But I think this is more due to SimpleXML's strange API than it is about XPath itself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM