I have a RSS feed and I need to extract the latest pubDate element from it for my test. What is the best way to do the same ?
RSS Feed link: https://secure.hyper-reach.com/rss/310085
Sample XML:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<atom:link href="https://secure.hyper-reach.com/rss/310085" rel="self" type="application/rss+xml" />
<link>https://secure.hyper-reach.com/rss/310085</link>
<title>Hyper-Reach Automated Test Account alerts feed "Automated RSS Test"</title>
<description>Constant feed of alerts from Automated Test Account via hyper-reach.com</description>
<lastBuildDate>Fri, 21 Nov 2014 00:56:15 -0500</lastBuildDate>
<language>null</language>
<ttl>5</ttl>
<item>
<title>Alert (2014-11-21)</title>
<pubDate>Fri, 21 Nov 2014 00:56:15 -0500</pubDate>
<description>This is a test message.</description>
<link>https://secure.hyper-reach.com/servlet/getprompt?prompt_id=122967&ver=0&format=34&nologin=1</link>
<guid isPermaLink="false">https://secure.hyper-reach.com/rss/item/257029</guid>
</item>
<item>...</item>
<item>...</item>
</channel>
</rss>
What I am doing:
checkRSSFeed = function() {
//first I navigate to a certain page in my website
var href = '';
casper.then(function() {
this.test.assertExists(x('//a[contains(@href, "SUBSTRING OF URL")]'), 'the element exists');
href = casper.getElementAttribute(x('//a[contains(@href, "SUBSTRING OF URL")]'), 'href');
}).then(function() {
this.open(href);
}).then(function() {
this.echo(this.getCurrentUrl());
var pubDate = '';
this.getPageContent();
pubDate = this._utils_.getElementByXPath('.//pubDate');
});
};
The error I am getting is
uncaughtError: TypeError: 'undefined' is not an object (evaluating 'this._utils_.getElementByXPath')
To retrieve the pubDate
content you can use the casper.fetchText
function, but it has a drawback that it concatenates all text nodes into one string:
casper.echo(casper.fetchText("pubDate"));
would print
Fri, 21 Nov 2014 00:56:15 -0500Fri, 21 Nov 2014 00:47:34 -0500Fri, 21 Nov 2014 00:45:36 -0500
To actually retrieve the text separately you can use casper.getElementsInfo
which works on multiple elements and provides the text
property. A simple mapping afterwards generates an array that you can work on afterwards:
var pubDates = casper.getElementsInfo("pubDate").map(function(elementInfo){
return elementInfo.text; // or even `return new Date(elementInfo.text)`
});
But since you only want the latest one and RSS feed are sorted newest to oldest, you can simply use the first one (note the lack of an s
in getElementInfo
):
var pubDate = casper.getElementInfo("pubDate").text;
You previous approach would have worked, if you would have done this in the page context. The clientutils module is only accessible in the page context (inside casper.evaluate
).
var pubDate = this.evaluate(function(){
return __utils__.getElementByXPath('//pubDate').innerText;
});
Note that __utils__
has two underscores on both sides. Also you cannot pass DOM elements from page context out to casper context, but you can pass strings and other primitive objects. Therefore I returned the innerText
property of the DOM element. The documentation says this:
Note: The arguments and the return value to the evaluate function must be a simple primitive object. The rule of thumb: if it can be serialized via JSON, then it is fine.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.