[英]Scraping multiple pages with Phantomjs/Pjscrape
trying to scrape multiple pages but can't get the urlid array to work within the pjscrape .js file.试图抓取多个页面,但无法让 urlid 数组在 pjscrape .js 文件中工作。
I'm pretty sure I might be making a newbie mistake but I would appreciate some help though.我很确定我可能会犯一个新手错误,但我会很感激一些帮助。 Thanks :)谢谢 :)
pjs.config({
timeoutInterval: 6000,
timeoutLimit: 10000,
})
pjs.addSuite({
// single URL or array
url: abolaURLs,
scraper: function(){
var abolaURLs = [366762,366764,366763];
for (var i = 0; i<abolaURLs.length; i++) {
abolaURLs[i] = 'http://abola.pt/nnh/ver.aspx?id=' + abolaURLs[i];
};
var results[];
var cenas1 = $('div#a5g2').text();
var cenas2 = $('span#noticiatext').text();
var cenas3 = $('div#a5x').text();
results.push(cenas1, cenas2, cenas3);
return results;
}
});
That will work for you:这对你有用:
var abolaURLs = [366762,366764,366763];
for (var i = 0; i < abolaURLs.length; i++) {
abolaURLs[i] = 'http://abola.pt/nnh/ver.aspx?id=' + abolaURLs[i];
};
pjs.addSuite({
url: abolaURLs,
scraper: function() {
var results = []; // !! you have the wrong array declaration result[]
var cenas1 = $('div#a5g2').text();
var cenas2 = $('span#noticiatext').text();
var cenas3 = $('div#a5x').text();
results.push(cenas1, cenas2, cenas3);
return results;
}
});
pjs.config({
timeoutInterval: 6000,
timeoutLimit: 10000,
});
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.