简体   繁体   English

如何处理依赖于另一个承诺的承诺循环

[英]How to deal with a loop of promises dependant on another promise

I'm new to JavaScript and I'm having trouble with promises. 我是JavaScript的新手,无法兑现承诺。 I'm using cloudscraper to retrieve a webpage's html to scrape data from. 我正在使用cloudscraper检索网页的html来从中抓取数据。 I have a simple function - getData() - which calls cloudscraper.get() and passes the html to the extract() function, which is responsible for scraping data. 我有一个简单的函数-getData()-调用cloudscraper.get()并将html传递给extract()函数,该函数负责抓取数据。 This is the working code: 这是工作代码:

const getData = function(pageUrl) {
  var data;
  return cloudscraper.get(pageUrl)
    .then(function(html) {
      data = extract(html);
      return data;  
    })
    .catch(function(err) {
      // handle error
    })
}

The "data" object returned contains an array of URLs I want to connect to, in order to retrieve other information. 返回的“数据”对象包含我想连接到的URL数组,以便检索其他信息。 That information has to be stored in the same data object. 该信息必须存储在同一数据对象中。 So I want to call cloudscraper.get() method again for each URL contained in the array. 所以我想再次为数组中包含的每个URL调用cloudscraper.get()方法。 I've tried the code below: 我试过下面的代码:

const getData = function(pageUrl) {
  var data;
  // first cloudscraper call:
  // retrieve main html
  return cloudscraper.get(pageUrl)
    .then(function(html) {
      // scrape data from it
      data = extract(html);
      for (let i = 0; i < data.array.length; ++i) {
        // for each URL scraped, call cloudscraper
        // to retrieve other data
        return cloudscraper.get(data.array[i])
          .then(function(newHtml) {
            // get other data with cheerio
            // and stores it in the same array
            data.array[i] = getNewData(newHtml);
          })
          .catch(function(err) {
            // handle error
          }) 
        }
        return data;  
      })
    .catch(function(err) {
      // handle error
    })
}

but it doesn't work, because the data object is returned before the promises in the loop are resolved. 但它不起作用,因为在解析循环中的promise之前已返回数据对象。 I know that probably there is a simple solution, but I couldn't figure it out, so could you please help me? 我知道可能有一个简单的解决方案,但是我无法弄清楚,所以请您能帮我吗? Thanks in advance. 提前致谢。

The best way to avoid these kinds of problems is to use async/await , as suggested in the comments. 避免此类问题的最佳方法是使用async / await ,如注释中所建议。 Here's an example based on your code: 这是一个基于您的代码的示例:

const getData = async function(pageUrl) {
  var data;
  // first cloudscraper call:
  // retrieve main html
  try {
    const html = await cloudscraper.get(pageUrl);
    // scrape data from it
    data = extract(html);
    for (let i = 0; i < data.array.length; ++i) {
      // for each URL scraped, call cloudscraper
      // to retrieve other data
      const newHtml = await cloudscraper.get(data.array[i]);
      // get other data with cheerio
      // and stores it in the same array
      data.array[i] = getNewData(newHtml); // if getNewData is also async, you need to add await
    }
  } catch (error) {
    // handle error
  }
  return data;
}
// You can call getData with .then().catch() outside of async functions 
// and with await inside async functions

This can be significantly simplified by using Promise.all , and await / async 这可以通过使用Promise.allawait / async大大简化。

If my understanding is correct, you are trying to execute the below steps: 如果我的理解是正确的,则您正在尝试执行以下步骤:

  1. Get original HTML 取得原始HTML
  2. Extract some HTML (looks like you're after some more urls) 提取一些HTML(看起来好像您在寻找更多的URL)
  3. For each url extracted, you want to re-call cloudscraper 对于提取的每个网址,您都想重新调用cloudscraper
  4. Put the results of each call back into the original data object. 将每个调用的结果放回原始数据对象。

 const getData = async (pageUrl) => { const html = await cloudscraper.get(pageUrl); const data = extractHtml(html); const promises = data.array.map( d => cloudscraper.get(d)); const results = await Promise.all(promises); // If you wanted to map the results back into the originaly data object data.array.forEach( (a, idx) => a = results[idx] ); return data; }; 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM