简体   繁体   中英

How to use Puppeteer to download pdf

I'm trying to do a bit of web scraping using Puppeteer, but I'm not sure how to actually download the documents I find. Specifically, I want to download the pdf from a page like this . The part of my code that's trying to download the pdf currently looks like this (commented lines being download attempts that didn't work):

                const newPagePromise = new Promise(x => browser.once('targetcreated', target => x(target.page())));
                await page.click('#gvDocketResult_ctl0'+rows.length+'_hlDocumentRedacted');
                await page.waitFor(3000);
                const newPage = await newPagePromise;
                // need to figure out how to download
                await newPage._client.send('Page.setDownloadBehavior', {behavior: 'allow', downloadPath: '/Users/me/Desktop'});
                // await newPage.pdf({path: 'hn.pdf', format: 'letter'});
                // await newPage.click('#download');
                // await newPage.click('#icon');

Sorry if this question seems really simple, I just started using Puppeteer a few days ago and am still a tad lost. If anyone knows how I should go about doing this, it would be very much appreciated.

EDIT: So from what I've found so far it seems like if I can get the link shown in the src = '' section of the webpage (image below) then I might be able to use a page.goto(link) to download the pdf? In any case I have no idea how to get to that link in puppeteer, so if anyone has advice on that it would also be appreciated. 在此处输入图像描述

You can download file by direct link with streams.

const https = require('https');

const fileUrl = await page.$eval('#plugin', file => file.src);
https.get(fileUrl, res => {
  stream = fs.createWriteStream('file.pdf');
  res.pipe(stream);
  stream.on('error', (err)=>{
     console.error(err);
  })
  stream.on('finish', () => {
     stream.close();
  })
})
enter code hereasync function retira_ficheiro(page, link) {                                                                                                                    
   
          await page.evaluate((link) =>
         {           
            location.href = link;
          },link);                                                                                                                    
        }  retira_ficheiro(page2,your_link)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM