繁体   English   中英

类型错误:无法读取 Node.js 和 puppeteer 中未定义的属性“匹配”

[英]TypeError: Cannot read property 'match' of undefined in Node.js and puppeteer

我正在尝试过滤包含一堆 url 的数组。 我需要返回仅包含“媒体发布”一词的网址。 它目前只是发回错误。 虽然我尝试删除我的package-lock.json ,但它仍然不起作用。

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.cbp.gov/newsroom/media-releases/all');
  const data = await page.evaluate(() => {
    const nodeList = document.getElementsByClassName('survey-processed');
    const urls = [];

    for (i=0; i<nodeList.length; i++) {
      urls.push(document.getElementsByClassName('survey-processed')[i].href);
    }
    const regex = new RegExp('/media-release\\b', 'g');
    const links = urls.filter(element => element.match(regex));
    return links;
  });
  console.log(data);
  await browser.close();
})();

误差(节点:10208)UnhandledPromiseRejectionWarning:错误:评估失败:类型错误:无法读取属性'匹配'在puppeteer_evaluation_script的未定义:在ExecutionContext._evaluateInternal(C 24:在50 puppeteer_evaluation_script在Array.filter()::11 11 \\用户\\Documents\\\\node_modules\\puppeteer\\lib\\cjs\\puppeteer\\common\\ExecutionContext.js:217:19) 在 processTicksAndRejections (internal/process/task_queues.js:86:5)

检查页面后,我发现了一些类survey-processed不是a元素的元素(两种形式: form#search-block-form.survey-processedform#views-exposed-form-newsroom-page.survey-processed )。

form元素没有href属性,因此它将是undefined ,这就是导致错误的原因。

要解决此问题,您必须更具体地选择元素,将querySelectorAll与此选择器"a.survey-processed"如下所示:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll("a.survey-processed");  // get only <a> elements that have the classname 'survey-processed'
    const urls = [];

    for (let i = 0; i < nodeList.length; i++) {                        // for each one of those
        if(/\/media-release\b/.test(nodeList[i].href)) {               // if the 'href' attribute matches the regex (use 'test' here rather than 'match')
            urls.push(nodeList[i].href);                               // push the 'href' attribute to the array
        }
    }

    return urls;
});

此外,如果您只查找包含短语"/media-release"网址,您可以使用 CSS 的属性 contains 选择器[attribute*=value]进一步缩短代码,如下所示:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll('a.survey-processed[href*="/media-release"]');  // get only <a> elements that have the classname 'survey-processed' and whose 'href' attribute contains the phrase "/media-release"
    return Array.from(nodeList).map(element => element.href);  // convert the NodeList into an array and use 'map' to get the 'href' attributes
});

您实际上可以直接返回过滤结果并使用.includes()检查它是否包含媒体发布

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://www.cbp.gov/newsroom/media-releases/all");
  const data = await page.evaluate(() => {
    return [
      ...document.querySelectorAll(".survey-processed")
    ].filter(({ href }) => href?.includes("media-release"));
  });
  console.log(data);
  await browser.close();
})();

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM