简体   繁体   中英

TypeError: Cannot read property 'match' of undefined in Node.js and puppeteer

I'm trying to filter an array that contains a bunch of urls. I need to return the urls that only contain the word "media-release". It currently just sends back the error. Although I tried removing my package-lock.json , it still doesn't work.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.cbp.gov/newsroom/media-releases/all');
  const data = await page.evaluate(() => {
    const nodeList = document.getElementsByClassName('survey-processed');
    const urls = [];

    for (i=0; i<nodeList.length; i++) {
      urls.push(document.getElementsByClassName('survey-processed')[i].href);
    }
    const regex = new RegExp('/media-release\\b', 'g');
    const links = urls.filter(element => element.match(regex));
    return links;
  });
  console.log(data);
  await browser.close();
})();

error (node:10208) UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'match' of undefined at puppeteer_evaluation_script :11:50 at Array.filter () at puppeteer_evaluation_script :11:24 at ExecutionContext._evaluateInternal (C:\\Users\\Documents\\\\node_modules\\puppeteer\\lib\\cjs\\puppeteer\\common\\ExecutionContext.js:217:19) at processTicksAndRejections (internal/process/task_queues.js:86:5)

After inspecting the page, I found some elements with the class survey-processed that are not an a element (two forms: form#search-block-form.survey-processed and form#views-exposed-form-newsroom-page.survey-processed ).

form elements don't have an href attribute thus it will be undefined and that's what causing the error.

To fix this issue you have to be more specific with selecting the elements, use querySelectorAll with this selector "a.survey-processed" like so:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll("a.survey-processed");  // get only <a> elements that have the classname 'survey-processed'
    const urls = [];

    for (let i = 0; i < nodeList.length; i++) {                        // for each one of those
        if(/\/media-release\b/.test(nodeList[i].href)) {               // if the 'href' attribute matches the regex (use 'test' here rather than 'match')
            urls.push(nodeList[i].href);                               // push the 'href' attribute to the array
        }
    }

    return urls;
});

Also, if you are looking for only urls that contain the phrase "/media-release" , you can use CSS's attribute contains selector [attribute*=value] to further shorten the code like so:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll('a.survey-processed[href*="/media-release"]');  // get only <a> elements that have the classname 'survey-processed' and whose 'href' attribute contains the phrase "/media-release"
    return Array.from(nodeList).map(element => element.href);  // convert the NodeList into an array and use 'map' to get the 'href' attributes
});

You could actually return your filtered result directly and check with .includes() if it contains media-release

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://www.cbp.gov/newsroom/media-releases/all");
  const data = await page.evaluate(() => {
    return [
      ...document.querySelectorAll(".survey-processed")
    ].filter(({ href }) => href?.includes("media-release"));
  });
  console.log(data);
  await browser.close();
})();

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM