How to use Puppeteer to download pdf

Question

I'm trying to do a bit of web scraping using Puppeteer, but I'm not sure how to actually download the documents I find. Specifically, I want to download the pdf from a page like this . The part of my code that's trying to download the pdf currently looks like this (commented lines being download attempts that didn't work):

                const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())));
                await page.click('#gvDocketResult_ctl0'+rows.length+'_hlDocumentRedacted');
                await page.waitFor(3000);
                const newPage = await newPagePromise;
                // need to figure out how to download
                await newPage._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: '/Users/me/Desktop'});
                // await newPage.pdf({path: 'hn.pdf', format: 'letter'});
                // await newPage.click('#download');
                // await newPage.click('#icon');

Sorry if this question seems really simple, I just started using Puppeteer a few days ago and am still a tad lost. If anyone knows how I should go about doing this, it would be very much appreciated.

EDIT: So from what I've found so far it seems like if I can get the link shown in the src = '' section of the webpage (image below) then I might be able to use a page.goto(link) to download the pdf? In any case I have no idea how to get to that link in puppeteer, so if anyone has advice on that it would also be appreciated.

Answer 1

You can download file by direct link with streams.

const https = require('https');

const fileUrl = await page.$eval('#plugin', file => file.src);
https.get(fileUrl, res => {
  stream = fs.createWriteStream('file.pdf');
  res.pipe(stream);
  stream.on('error', (err)=>{
     console.error(err);
  })
  stream.on('finish', () => {
     stream.close();
  })
})

Answer 2

enter code hereasync function retira_ficheiro(page, link) {                                                                                                                    
   
          await page.evaluate((link) =>
         {           
            location.href = link;
          },link);                                                                                                                    
        }  retira_ficheiro(page2,your_link)

How to use Puppeteer to download pdf

Question

2 answers

solution1
0 2022-07-22 12:03:25

solution2
-1 2022-07-22 11:42:26

How to use Puppeteer to download pdf

Question

2 answers

solution1 0 2022-07-22 12:03:25

solution2 -1 2022-07-22 11:42:26

solution1
0 2022-07-22 12:03:25

solution2
-1 2022-07-22 11:42:26