简体   繁体   English

Node js Puppeteer goto页面数组

[英]Node js Puppeteer goto array of pages

I try to go page by page from my array, but get this: 我尝试从数组中逐页浏览,但是得到以下信息:

(node:4196) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. (节点:4196)MaxListenersExceededWarning:检测到可能的EventEmitter内存泄漏。 11 request listeners added. 添加了11个请求侦听器。 Use emitter.setMaxListeners() to increase limit (node:4196) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 使用Emitter.setMaxListeners()增加限制(节点:4196)MaxListenersExceededWarning:检测到可能的EventEmitter内存泄漏。 11 framedetached listeners adde d. 11个独立于框架的侦听器。 Use emitter.setMaxListeners() to increase limit (node:4196) MaxListenersExceededWarning: Possible EventEmitter memory leak detected. 使用Emitter.setMaxListeners()增加限制(节点:4196)MaxListenersExceededWarning:检测到可能的EventEmitter内存泄漏。 11 lifecycleevent listeners add ed. 11个lifecycleevent侦听器添加了ed。 Use emitter.setMaxListeners() to increase limit (node:4196) UnhandledPromiseRejectionWarning: Error: Protocol error (Page.navigate): Target closed. 使用generator.setMaxListeners()增加限制(节点:4196)UnhandledPromiseRejectionWarning:错误:协议错误(Page.navigate):目标已关闭。 at Promise (D:\\Kutz\\irrParse\\node_modules\\puppeteer\\lib\\Connection.js:198:56) at new Promise () at CDPSession.send (D:\\Kutz\\irrParse\\node_modules\\puppeteer\\lib\\Connection.js:197:12) at navigate (D:\\Kutz\\irrParse\\node_modules\\puppeteer\\lib\\Page.js:520:39) at Page.goto (D:\\Kutz\\irrParse\\node_modules\\puppeteer\\lib\\Page.js:500:7) at uniqueLinks.forEach (D:\\Kutz\\irrParse\\scrape.js:26:16) at Array.forEach () at D:\\Kutz\\irrParse\\scrape.js:25:15 at at process._tickCallback (internal/process/next_tick.js:118:7) (node:4196) UnhandledPromiseRejectionWarning: Unhandled promise rejection. 在CDPSession.send(D:\\ Kutz \\ irrParse \\ node_modules \\ puppeteer \\ lib \\ Connection.js)的新Promise()处的Promise(D:\\ Kutz \\ irrParse \\ node_modules \\ puppeteer \\ lib \\ Connection.js:198:56) :197:12)在Page.goto(D:\\ Kutz \\ irrParse \\ node_modules \\ puppeteer \\ lib \\ Page.js)导航(D:\\ Kutz \\ irrParse \\ node_modules \\ puppeteer \\ lib \\ Page.js:520:39) :500:7),位于进程处的Array.forEach(),位于D:\\ Kutz \\ irrParse \\ scrape.js:25:15,位于uniqueLinks.forEach(D:\\ Kutz \\ irrParse \\ scrape.js:26:16)。 _tickCallback(内部/进程/下一个_tick.js:118:7)(节点:4196)UnhandledPromiseRejectionWarning:未处理的承诺拒绝。 This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). 引发此错误的原因可能是抛出了一个没有catch块的异步函数,或者是拒绝了一个.catch()无法处理的承诺。 (r ejection id: 1) (node:4196) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. (r退出ID:1)(节点:4196)[DEP0018] DeprecationWarning:不建议使用未处理的承诺拒绝。 In the future, promise reject ions that are not handled will terminate the Node.js process with a non-zero exit code. 将来,未处理的承诺拒绝离子将以非零退出代码终止Node.js进程。 (node:4196) UnhandledPromiseRejectionWarning: Error: Navigation Timeout Exceeded: 30000ms exceeded at Promise.then (D:\\Kutz\\irrParse\\node_modules\\puppeteer\\lib\\NavigatorWatcher.js:71:21) at (node:4196)UnhandledPromiseRejectionWarning:错误:导航超时超过:Promise超过30000ms。然后(D:\\ Kutz \\ irrParse \\ node_modules \\ puppeteer \\ lib \\ NavigatorWatcher.js:71:21)

const puppeteer = require("puppeteer");
var forEach = require('async-foreach').forEach;


const url = "https://reddit.com/r/programming";
const linkSelector = ".content a.title";

(async () => {
  // Launch chrome process
  const browser = await puppeteer.launch({headless: true});
  const page = await browser.newPage();

  await page.goto(url, { waitUntil: "load" });

  // This runs the `document.querySelectorAll` within the page and passes
  // the result to function
  const links = await page.$$eval(linkSelector, links => {
    return links.map((link) => link.href);
  });

  // Make sure we get the unique set of links only
  const uniqueLinks = [...links];
  //console.log(uniqueLinks[0]);

  uniqueLinks.forEach(async (link) => {
    await page.goto(link, { waitUntil: "load" });
  });

  // Kill the browser process
  await browser.close();
})();

Error throws in forEach() 错误抛出forEach()

Unfortunately, Array.prototype.forEach 's iterator function is not executed in an async manner as you would expect when defining it as async. 不幸的是, Array.prototype.forEach的迭代器函数没有以异步方式执行,正如您将其定义为异步时所期望的那样。 Using a for loop should work for what you're trying to do. 使用for循环应该可以满足您的需求。

for (let i = 0; i < uniqueLinks.length; i ++) {
  await page.goto(uniqueLinks[i], { waitUntil: "load" });
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM