[英]How to deal with a loop of promises dependant on another promise
I'm new to JavaScript and I'm having trouble with promises. 我是JavaScript的新手,无法兑现承诺。 I'm using cloudscraper to retrieve a webpage's html to scrape data from.
我正在使用cloudscraper检索网页的html来从中抓取数据。 I have a simple function - getData() - which calls cloudscraper.get() and passes the html to the extract() function, which is responsible for scraping data.
我有一个简单的函数-getData()-调用cloudscraper.get()并将html传递给extract()函数,该函数负责抓取数据。 This is the working code:
这是工作代码:
const getData = function(pageUrl) {
var data;
return cloudscraper.get(pageUrl)
.then(function(html) {
data = extract(html);
return data;
})
.catch(function(err) {
// handle error
})
}
The "data" object returned contains an array of URLs I want to connect to, in order to retrieve other information. 返回的“数据”对象包含我想连接到的URL数组,以便检索其他信息。 That information has to be stored in the same data object.
该信息必须存储在同一数据对象中。 So I want to call cloudscraper.get() method again for each URL contained in the array.
所以我想再次为数组中包含的每个URL调用cloudscraper.get()方法。 I've tried the code below:
我试过下面的代码:
const getData = function(pageUrl) {
var data;
// first cloudscraper call:
// retrieve main html
return cloudscraper.get(pageUrl)
.then(function(html) {
// scrape data from it
data = extract(html);
for (let i = 0; i < data.array.length; ++i) {
// for each URL scraped, call cloudscraper
// to retrieve other data
return cloudscraper.get(data.array[i])
.then(function(newHtml) {
// get other data with cheerio
// and stores it in the same array
data.array[i] = getNewData(newHtml);
})
.catch(function(err) {
// handle error
})
}
return data;
})
.catch(function(err) {
// handle error
})
}
but it doesn't work, because the data object is returned before the promises in the loop are resolved. 但它不起作用,因为在解析循环中的promise之前已返回数据对象。 I know that probably there is a simple solution, but I couldn't figure it out, so could you please help me?
我知道可能有一个简单的解决方案,但是我无法弄清楚,所以请您能帮我吗? Thanks in advance.
提前致谢。
The best way to avoid these kinds of problems is to use async/await , as suggested in the comments. 避免此类问题的最佳方法是使用async / await ,如注释中所建议。 Here's an example based on your code:
这是一个基于您的代码的示例:
const getData = async function(pageUrl) {
var data;
// first cloudscraper call:
// retrieve main html
try {
const html = await cloudscraper.get(pageUrl);
// scrape data from it
data = extract(html);
for (let i = 0; i < data.array.length; ++i) {
// for each URL scraped, call cloudscraper
// to retrieve other data
const newHtml = await cloudscraper.get(data.array[i]);
// get other data with cheerio
// and stores it in the same array
data.array[i] = getNewData(newHtml); // if getNewData is also async, you need to add await
}
} catch (error) {
// handle error
}
return data;
}
// You can call getData with .then().catch() outside of async functions
// and with await inside async functions
This can be significantly simplified by using Promise.all
, and await
/ async
这可以通过使用
Promise.all
和await
/ async
大大简化。
If my understanding is correct, you are trying to execute the below steps: 如果我的理解是正确的,则您正在尝试执行以下步骤:
cloudscraper
cloudscraper
const getData = async (pageUrl) => { const html = await cloudscraper.get(pageUrl); const data = extractHtml(html); const promises = data.array.map( d => cloudscraper.get(d)); const results = await Promise.all(promises); // If you wanted to map the results back into the originaly data object data.array.forEach( (a, idx) => a = results[idx] ); return data; };
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.