简体   繁体   中英

Async not awaiting function before running

I'm trying to parse a specification website from saved HTML on my computer. I can post the file upon request.

I'm burnt out trying to figure out why it won't run synchronously. The comments should log the CCCC 's first, then BBBB 's, then finally one AAAA .

The code I'm running will not wait at the first hurdle (it prints AAAA... first). Am I using request-promise incorrectly? What is going on?

Is this due to the .each() method of cheerio (I'm assuming it's synchronous)?

const rp = require('request-promise');
const fs = require('fs');
const cheerio = require('cheerio');

async function parseAutodeskSpec(contentsHtmlFile) {
  const topics = [];
  const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
  const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';

  contentsPage(contentsSelector).each(async (idx, topicsAnchor) => {
    const topicsHtml = await rp(topicsAnchor.attribs['href']);
    console.log("topicsHtml.length: ", topicsHtml.length);
  });

  console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");

  return topics;
}

As @lumio stated in his comment, I also think that this is because of the each function being synchrone.

You should rather use the map method, and use the Promise.all() on the result to wait enough time:

const obj = contentsPage(contentsSelector).map(async (idx, topicsAnchor) => {
  const topicsHtml = await rp(topicsAnchor.attribs['href']);
  console.log("topicsHtml.length: ", topicsHtml.length);

  const topicsFromPage = await parseAutodeskTopics(topicsHtml)
  console.log("topicsFromPage.length: ", topicsFromPage.length);

  topics.concat(topicsFromPage);
})

const filtered = Object.keys(obj).filter(key => !isNaN(key)).map(key => obj[key])

await Promise.all(filtered)

console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");

Try it this way:

let hrefs = contentsPage(contentsSelector).map((idx, topicsAnchor) => {
  return topicsAnchor.attribs['href']
}).get()


let topicsHtml
for(href of hrefs){
  topicsHtml = await rp(href);
  console.log("topicsHtml.length: ", topicsHtml.length);
}

Now the await is outside of map or each which doesn't quite work the way you think.

Based on the other answers here I came to a rather elegant conclusion. Note the avoidance of async / await in the .map() callback, as cheerio 's callbacks (and from what I've learned about async / await , generally all callbacks) seem not to honour the synchronous nature of await well:

async function parseAutodeskSpec(contentsHtmlFile) {
  const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
  const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';

  const contentsReqs = contentsPage(contentsSelector)
    .map((idx, elem) => rp(contentsPage(elem).attr('href')))
    .toArray();

  const topicsReqs = await Promise.all(contentsReqs)
    .map(req => parseAutodeskTopics(req));

  return await Promise.all(topicsReqs);
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM