简体   繁体   中英

Puppeteer: Remove links from page

I am converting a webpage into a.pdf-file with the help of Node.js and Puppeteer.

This works fine, but I want to remove all links on this page before converting it to a.pdf-file because otherwise the.pdf-file includes these links which can't be opened in my app when someone clicks on them. Is there a way to do so?

The page is an.aspx page which uses javascript. The links all start with "javascript:__". It is an intranet page which shows our meals and I just want to display the mealplan as a.pdf.

What I have in my.js-file looks like this:

const puppeteer = require('puppeteer');
let url = 'http://my-url.de/meals.aspx'
let browser = await puppeteer.launch()
let page = await browser.newPage()
await page.goto(url, {waitUntil: 'networkidle2' })
await page.pdf({
    format:"A4",
    path:files[0],
    displayHeaderFooter: false,
    printBackground:true
})

In my app it says "URL can't be opened", thats why I want these links to be removed.

pdf文件

It seems that these are not proper links, at least they are not <a> tags with href pointing to a website.

Instead, you are dealing with links that require javascript to navigate and that's why these are not working in the pdf.

What you could do is transform all these invalid hrefs to something valid for a pdf before capturing the page.

Check my attempt below. Its possible that you need to modify it a bit to suit your case since I don't have access to the actual website you try to parse.

const puppeteer = require('puppeteer');
let url = 'http://my-url.de/meals.aspx'

(async() => {
  let browser = await puppeteer.launch()
  let page = await browser.newPage()
  await page.goto(url, {
    waitUntil: 'networkidle2'
  })

  // Modifing the page here
  await page.evaluate(_ => {
    // Capture all links that start with javascript on the href property
    // and change it to # instead.
    document.querySelectorAll('a[href^="javascript"]')
      .forEach(a => {
        a.href = '#'
      })
  });

  await page.pdf({
    format: "A4",
    path: files[0],
    displayHeaderFooter: false,
    printBackground: true
  })
})()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM