简体   繁体   中英

How can you attach Typescript/Javascript Functions to Puppeteer Page Context

I'm writing a web scraping application in Typescript with Puppeteer. I'm "attaching" a Javascript file with utility functions to the page instance, to make the scraping easier (This is done with Pupeteer's page.addScriptTag function, see the API here ). Here's what one of the utility functions on the page might look like:

// functions.ts

export const getLink = (node: Element) => {
  let link = node.querySelector("a");
  return link ? link.href : null;
};

Then you can use the functions inside page.evaluate :

// process.ts

import { getLink } from "../functions";

interface LinkArgs {
  page: puppeteer.Page;
  selector: selector;
}

export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
  page.evaluate((selector) => {
    const link = getLink(selector); // I'm using the function here.
    return link;
  }, selectors);

The problem is that when I'm doing this, the imports are failing during development. I believe this is because the import and export compiled syntax is not working inside of chrome. Here's the error from my browser:

Could not get links.  Error: Evaluation failed: ReferenceError: src_1 is not defined
    at __puppeteer_evaluation_script__:2:20
    at ExecutionContext._evaluateInternal (/Users/harrisoncramer/Desktop/Code/projects/gql3.0_schedulers/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:217
:19)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async ExecutionContext.evaluate (/Users/harrisoncramer/Desktop/Code/projects/gql3.0_schedulers/node_modules/puppeteer/lib/cjs/puppeteer/common/ExecutionContext.js:106:16
)
Evaluation failed: ReferenceError: src_1 is not defined
    at __puppeteer_evaluation_script__:2:20

I've got a hacky workaround: I'm punching the functions.ts file into a compiler, and then removing all of the export keywords from the functions.js file. Then, I'm removing all of the import statements from inside the process.ts file, like this:

// functions.js

const getLink = (node) => {
  let link = node.querySelector("a");
  return link ? link.href : null;
};

// process.js

    // Turning off this import...
    // import { getLink } from "../functions"; 

interface LinkArgs {
  page: puppeteer.Page;
  selector: selector;
}

export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
  page.evaluate((selector) => {
    const link = getLink(selector); // I'm using the function here.
    return link;
  }, selectors);

This, however, breaks the type checking during development! What's the better way of solving this problem?! How can one import compiled Javascript functions onto the page without breaking the Typescript type-checking?

Anything inside of page.evaluate is essentially run inside of Chrome's DevTools console, or in the same context you'd be in if you were to do so. So imports won't work in this context, at least not how you're attempting it. You have to explicitly pass the function into the context like this:

const getLink = (node) => {
  let link = node.querySelector("a");
  return link ? link.href : null;
};

// process.js

    // Turning off this import...
    // import { getLink } from "../functions"; 

interface LinkArgs {
  page: puppeteer.Page;
  selector: selector;
}

export const getLinkFromPage = async ({ page, selector }): LinkArgs) =>
  page.evaluate((selector, getLink) => {
    const link = getLink(selector); // I'm using the function here.
    return link;
  }, selectors, getLink);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM