TypeError: Cannot read property 'match' of undefined in Node.js and puppeteer

Question

I'm trying to filter an array that contains a bunch of urls. I need to return the urls that only contain the word "media-release". It currently just sends back the error. Although I tried removing my package-lock.json , it still doesn't work.

const puppeteer = require('puppeteer');

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto('https://www.cbp.gov/newsroom/media-releases/all');
  const data = await page.evaluate(() => {
    const nodeList = document.getElementsByClassName('survey-processed');
    const urls = [];

    for (i=0; i<nodeList.length; i++) {
      urls.push(document.getElementsByClassName('survey-processed')[i].href);
    }
    const regex = new RegExp('/media-release\\b', 'g');
    const links = urls.filter(element => element.match(regex));
    return links;
  });
  console.log(data);
  await browser.close();
})();

error (node:10208) UnhandledPromiseRejectionWarning: Error: Evaluation failed: TypeError: Cannot read property 'match' of undefined at puppeteer_evaluation_script :11:50 at Array.filter () at puppeteer_evaluation_script :11:24 at ExecutionContext._evaluateInternal (C:\\Users\\Documents\\\\node_modules\\puppeteer\\lib\\cjs\\puppeteer\\common\\ExecutionContext.js:217:19) at processTicksAndRejections (internal/process/task_queues.js:86:5)

Answer 1

After inspecting the page, I found some elements with the class survey-processed that are not an a element (two forms: form#search-block-form.survey-processed and form#views-exposed-form-newsroom-page.survey-processed ).

form elements don't have an href attribute thus it will be undefined and that's what causing the error.

To fix this issue you have to be more specific with selecting the elements, use querySelectorAll with this selector "a.survey-processed" like so:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll("a.survey-processed");  // get only <a> elements that have the classname 'survey-processed'
    const urls = [];

    for (let i = 0; i < nodeList.length; i++) {                        // for each one of those
        if(/\/media-release\b/.test(nodeList[i].href)) {               // if the 'href' attribute matches the regex (use 'test' here rather than 'match')
            urls.push(nodeList[i].href);                               // push the 'href' attribute to the array
        }
    }

    return urls;
});

Also, if you are looking for only urls that contain the phrase "/media-release" , you can use CSS's attribute contains selector [attribute*=value] to further shorten the code like so:

const data = await page.evaluate(() => {
    const nodeList = document.querySelectorAll('a.survey-processed[href*="/media-release"]');  // get only <a> elements that have the classname 'survey-processed' and whose 'href' attribute contains the phrase "/media-release"
    return Array.from(nodeList).map(element => element.href);  // convert the NodeList into an array and use 'map' to get the 'href' attributes
});

Answer 2

You could actually return your filtered result directly and check with .includes() if it contains media-release

const puppeteer = require("puppeteer");

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto("https://www.cbp.gov/newsroom/media-releases/all");
  const data = await page.evaluate(() => {
    return [
      ...document.querySelectorAll(".survey-processed")
    ].filter(({ href }) => href?.includes("media-release"));
  });
  console.log(data);
  await browser.close();
})();

TypeError: Cannot read property 'match' of undefined in Node.js and puppeteer

Question

2 answers

solution1
2 ACCPTED 2020-10-22 11:27:50

solution2
1 2020-10-22 11:27:54

TypeError: Cannot read property 'match' of undefined in Node.js and puppeteer

Question

2 answers

solution1 2 ACCPTED 2020-10-22 11:27:50

solution2 1 2020-10-22 11:27:54

solution1
2 ACCPTED 2020-10-22 11:27:50

solution2
1 2020-10-22 11:27:54