简体   繁体   中英

Extract XML element value from a RSS feed

I have a RSS feed and I need to extract the latest pubDate element from it for my test. What is the best way to do the same ?

RSS Feed link: https://secure.hyper-reach.com/rss/310085

Sample XML:

<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <atom:link href="https://secure.hyper-reach.com/rss/310085" rel="self" type="application/rss+xml" />
        <link>https://secure.hyper-reach.com/rss/310085</link>
        <title>Hyper-Reach Automated Test Account alerts feed "Automated RSS Test"</title>
        <description>Constant feed of alerts from Automated Test Account via hyper-reach.com</description>
        <lastBuildDate>Fri, 21 Nov 2014 00:56:15 -0500</lastBuildDate>
        <language>null</language>
        <ttl>5</ttl>
        <item>
            <title>Alert (2014-11-21)</title>
            <pubDate>Fri, 21 Nov 2014 00:56:15 -0500</pubDate>
            <description>This is a test message.</description>
            <link>https://secure.hyper-reach.com/servlet/getprompt?prompt_id=122967&amp;ver=0&amp;format=34&amp;nologin=1</link>
            <guid isPermaLink="false">https://secure.hyper-reach.com/rss/item/257029</guid>
        </item>
        <item>...</item>
        <item>...</item>
</channel>
</rss>

What I am doing:

checkRSSFeed = function() {
    //first I navigate to a certain page in my website
    var href = '';

    casper.then(function() {
        this.test.assertExists(x('//a[contains(@href, "SUBSTRING OF URL")]'), 'the element exists');
        href = casper.getElementAttribute(x('//a[contains(@href, "SUBSTRING OF URL")]'), 'href');
     }).then(function() {
        this.open(href);
     }).then(function() {
        this.echo(this.getCurrentUrl());

        var pubDate = '';
        this.getPageContent();
        pubDate = this._utils_.getElementByXPath('.//pubDate');
     });
};  

The error I am getting is

uncaughtError: TypeError: 'undefined' is not an object (evaluating 'this._utils_.getElementByXPath')

To retrieve the pubDate content you can use the casper.fetchText function, but it has a drawback that it concatenates all text nodes into one string:

casper.echo(casper.fetchText("pubDate"));

would print

Fri, 21 Nov 2014 00:56:15 -0500Fri, 21 Nov 2014 00:47:34 -0500Fri, 21 Nov 2014 00:45:36 -0500

To actually retrieve the text separately you can use casper.getElementsInfo which works on multiple elements and provides the text property. A simple mapping afterwards generates an array that you can work on afterwards:

var pubDates = casper.getElementsInfo("pubDate").map(function(elementInfo){
    return elementInfo.text; // or even `return new Date(elementInfo.text)`
});

But since you only want the latest one and RSS feed are sorted newest to oldest, you can simply use the first one (note the lack of an s in getElementInfo ):

var pubDate = casper.getElementInfo("pubDate").text;

You previous approach would have worked, if you would have done this in the page context. The clientutils module is only accessible in the page context (inside casper.evaluate ).

var pubDate = this.evaluate(function(){
    return __utils__.getElementByXPath('//pubDate').innerText;
});

Note that __utils__ has two underscores on both sides. Also you cannot pass DOM elements from page context out to casper context, but you can pass strings and other primitive objects. Therefore I returned the innerText property of the DOM element. The documentation says this:

Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM