简体   繁体   English

在运行之前异步不等待功能

[英]Async not awaiting function before running

I'm trying to parse a specification website from saved HTML on my computer. 我正在尝试从我的计算机上保存的HTML解析规范网站。 I can post the file upon request. 我可以根据要求发布文件。

I'm burnt out trying to figure out why it won't run synchronously. 我试图弄清楚它为什么不会同步运行而烧坏了。 The comments should log the CCCC 's first, then BBBB 's, then finally one AAAA . 评论应该记录CCCC的第一个,然后是BBBB的,然后最后一个AAAA

The code I'm running will not wait at the first hurdle (it prints AAAA... first). 我正在运行的代码不会在第一个障碍等待(它首先打印AAAA... )。 Am I using request-promise incorrectly? 我是否错误地使用了request-promise What is going on? 到底是怎么回事?

Is this due to the .each() method of cheerio (I'm assuming it's synchronous)? 这是由于.each()的方法cheerio (我假设它是同步的)?

const rp = require('request-promise');
const fs = require('fs');
const cheerio = require('cheerio');

async function parseAutodeskSpec(contentsHtmlFile) {
  const topics = [];
  const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
  const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';

  contentsPage(contentsSelector).each(async (idx, topicsAnchor) => {
    const topicsHtml = await rp(topicsAnchor.attribs['href']);
    console.log("topicsHtml.length: ", topicsHtml.length);
  });

  console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");

  return topics;
}

As @lumio stated in his comment, I also think that this is because of the each function being synchrone. 正如@lumio在他的评论中所说,我也认为这是因为each函数都是同步的。

You should rather use the map method, and use the Promise.all() on the result to wait enough time: 你应该使用map方法,并在结果上使用Promise.all()来等待足够的时间:

const obj = contentsPage(contentsSelector).map(async (idx, topicsAnchor) => {
  const topicsHtml = await rp(topicsAnchor.attribs['href']);
  console.log("topicsHtml.length: ", topicsHtml.length);

  const topicsFromPage = await parseAutodeskTopics(topicsHtml)
  console.log("topicsFromPage.length: ", topicsFromPage.length);

  topics.concat(topicsFromPage);
})

const filtered = Object.keys(obj).filter(key => !isNaN(key)).map(key => obj[key])

await Promise.all(filtered)

console.log("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA");

Try it this way: 试试这种方式:

let hrefs = contentsPage(contentsSelector).map((idx, topicsAnchor) => {
  return topicsAnchor.attribs['href']
}).get()


let topicsHtml
for(href of hrefs){
  topicsHtml = await rp(href);
  console.log("topicsHtml.length: ", topicsHtml.length);
}

Now the await is outside of map or each which doesn't quite work the way you think. 现在await在地图之外,或者每个都不像你想象的那样工作。

Based on the other answers here I came to a rather elegant conclusion. 基于其他答案,我得出了一个相当优雅的结论。 Note the avoidance of async / await in the .map() callback, as cheerio 's callbacks (and from what I've learned about async / await , generally all callbacks) seem not to honour the synchronous nature of await well: 注意在.map()回调中避免使用async / await ,因为cheerio的回调(以及我所了解的async / await ,通常是所有的回调)似乎都没有遵守await的同步特性:

async function parseAutodeskSpec(contentsHtmlFile) {
  const contentsPage = cheerio.load(fs.readFileSync(contentsHtmlFile).toString());
  const contentsSelector = '.content_htmlbody table td div div#divtreed0e338374 nobr .toc_entry a.treeitem';

  const contentsReqs = contentsPage(contentsSelector)
    .map((idx, elem) => rp(contentsPage(elem).attr('href')))
    .toArray();

  const topicsReqs = await Promise.all(contentsReqs)
    .map(req => parseAutodeskTopics(req));

  return await Promise.all(topicsReqs);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM