简体   繁体   English

使用nightmare.js发出各种请求

[英]Make various requests using nightmare.js

I'm using nightmarejs to scrape a website. 我正在使用nightmarejs抓取网站。 First, i make a request to fetch some links, which lead to another page with more information that i also want. 首先,我请求获取一些链接,这些链接会转到另一个页面,其中也包含我想要的更多信息。 I've separated it in two functions: 我将其分为两个功能:

const { csvFormat } = require('d3-dsv');
const Nightmare = require('nightmare');
const { writeFileSync } = require('fs');

const url = 'https://lojaonline.claro.com.br/celular';

function getUrls (){
    console.log('Extraindo Links...');
    const nightmare = new Nightmare({show: true});
    var p1 = '51030';
    var p2 = '560';
    try{
        nightmare.goto(url).wait('input[id="edit-cep-part1"]')
                           .type('input[id="edit-cep-part1"]', p1)
                           .wait('input[id="edit-cep-part2"]')
                           .type('input[id="edit-cep-part2"]', p2)
                           .click('input[value="Confirmar"]')
                           .wait('#products-container .products-list').evaluate(function(){

            return Array.from(document.querySelectorAll('.offer')).map(element => element.href);            

        }).end()
        .then(function(result){

            var listaUrls = Object.values(result);

            return listaUrls;

        })
        .then(function(listaUrls){
            listaUrls.forEach(function(link){
                console.log('Pegando preços de ' + link);
                getPrecos(link);
            });
        });
    }catch(e){
        console.error(e);
    }
};

function getPrecos(endereco) {
    console.log('Extraindo preços...');
    const nightmare = new Nightmare({gotoTimeout: 999999999});
    var p1 = '51030';
    var p2 = '560';
    try{

         nightmare.goto(endereco).wait('input[id="edit-cep-part1"]')
                                .type('input[id="edit-cep-part1"]', p1)
                                .wait('input[id="edit-cep-part2"]')
                                .type('input[id="edit-cep-part2"]', p2)
                                .click('input[value="Confirmar"]')
                                .wait('#plans-tab').evaluate(function(){

            return Array.from(document.querySelectorAll('tr.body')).map(element => element.innerText);          

        }).end()
        .then(function(result){

            var listaPrecos = Object.values(result);

            console.log(listaPrecos);
        });
    }catch(e){
        console.error(e);
    }
};

getUrls();

It works for the most part. 它在大多数情况下都起作用。 Some requests are successful and i'm able to get the information but some requests are timed out after 30seconds: 一些请求成功了,我能够获取信息,但是一些请求在30秒后超时:

 UnhandledPromiseRejectionWarning: Error: .wait() for #plans-tab timed out after 30000msec.

I have to wait, input and click because this particular website asks for a zip code before displaying the data. 我必须等待,输入并单击,因为此特定网站在显示数据之前要求提供邮政编码。 And if i do show : true inside the getPrecos function, 20 instances of electron will pop up. 如果我确实在getPrecos函数中显示:true,则将弹出20个电子实例。 What am i doing wrong here? 我在这里做错了什么?

Is there a way to only fire a request after the previous one is finished? 有没有办法仅在前一个请求完成后才触发请求?

You are using forEach to loop thru the list, if you want them to grab data one by one, then you should use for...of and async await or some promise library with concurrency support. 您正在使用forEach遍历列表,如果希望它们一个接一个地获取数据,则应使用for...ofasync await或具有并发支持的Promise库。

listaUrls.forEach(function(link) {
  console.log("Pegando preços de " + link);
  getPrecos(link);
});

The above snippet can be turned with async await and for loop like below. 可以使用异步等待和for循环打开以上代码段,如下所示。

// other part of code
.then(async function(listaUrls) { // <-- async function
  for(const link of listaUrls){
      console.log("Pegando preços de " + link);
      await getPrecos(link); // <-- go thru the link one by one
  }
});

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM