I am trying a simple code using phantomJS but no luck.
var page = new WebPage();
var system = require('system');
var site=system.args[1];
var page = require('webpage').create();
page.onError = function (msg, trace)
{
console.log(msg);
trace.forEach(function(item) {
console.log(' ', item.file, ':', item.line);
})
}
page.open("https://www.mightydeals.co.uk/Products/all/National/Grey-
Small/132212", function(){
var p=page.evaluate(function(){
return [].map.call(document.querySelectorAll('#productInformation'),
function(link) {
return link.innerText;
});
});
console.log(p);
});
phantom.exit();
});
The page is above in the function, and also here I am representing: Link to page
I am getting errors and null output only.
I need to get the product descriptions but its not giving any description but errors.
I can see the page has error itself by console that says
Uncaught SyntaxError: Unexpected token <
Is the page error causing problem or anything else, please suggest/advice.
The default PhantomJS requests (without headers settings), are interpreted as a mobile device for some pages. In this case, when you call page.open
, the requested url is redirect to http://m.mightydeals.co.uk/index.html#dealList/productId=132212&menu1Id=1&menu2Id=0&
which doesn't have any #productInformation
element.
You can check this behavior with page.render('page.png')
(will take a screenshot) inside page.open
callback and before page.evaluate
.
A quick fix for this is to set a custom header before page.open
.
page.customHeaders = {
'User-Agent' : 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:42.0) Gecko/20100101 Firefox/42.0',
'Accept': '*/*',
'Accept-Language': 'nb-NO,nb;q=0.9,no-NO;q=0.8,no;q=0.6,nn-NO;q=0.5,nn;q=0.4,en-US;q=0.3,en;q=0.1',
'Connection': 'keep-alive'
};
or get the elements to be scrapped in mobile version of page.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.