简体   繁体   中英

Is it possible to get pdf page using pyppeteer?

import asyncio
import pyppeteer
import logging
from pyppeteer import launch

pyppeteer.DEBUG = True
for name in logging.root.manager.loggerDict:
    logging.getLogger(name).disabled = True

async def main():
    browser = await launch(headless = False)
    page = await browser.newPage()
    await page.setJavaScriptEnabled(True)
    response = await page.goto('http://www.africau.edu/images/default/sample.pdf',
                                time = 3000, waitUntil = ['domcontentloaded', 'load', 'networkidle0'])
    content = await response.buffer()
    print(content)
    await browser.close()

asyncio.get_event_loop().run_until_complete(main())

expected output: content of http://www.africau.edu/images/default/sample.pdf

got output: b'df48fcc4-a0b0-4e86-b52e-0ec012ee791e'

Python 3,Linux Ubuntu

I'd suggest using pyppdf it's a Python port of the Puppeteer.

conda install -c defaults -c conda-forge pyppdf
OR
pip install pyppdf

it has a handy function save_pdf

 def save_pdf(output_file: str=None, url: str=None, html: str=None, args_dict: Union[str, dict]=None, args_upd: Union[str, dict]=None, goto: str=None, dir_: str=None) -> bytes:

or you could simply just

await page.screenshot({'path': 'ss.png'})
await page.pdf({'path': 'sample.pdf'})

I'm aware that you are asking for a solution using pyppeteer , but honestly this can be done way easier with requests .


import requests


def main():
    r = requests.get("http://www.africau.edu/images/default/sample.pdf")
    with open("sample.pdf", "wb") as file:
        file.write(r.content)

if __name__ == "__main__":
    main()

That's all your file will be saved in a file called sample.pdf .

As many people answered and I tried myself as well.
To reply exactly to asked question "Is it possible to get pdf page using pyppeteer?", the answer is No .

You can try to use headless = True , but still without success.
You can save opened page as a pdf, but You can't store requested pdf using pypeteer directly and access it's content from response.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM