简体   繁体   中英

Error generating PDF (blank or format error) - Playwright Python

Context:

  • Playwright Version: 1.29.1
  • Operating System: Windows
  • Python version: 3.8.2
  • Browser: Chromium

Describe the bug

This error happens in some specific situations, usually when directly or indirectly opening a pdf preview page.

def test():

  from playwright.sync_api import sync_playwright
  with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)  _**#In this situation, with headless set to False, the blank pdf is issued, but when set to True, it is not possible to send it. Any suggestion?**_
    context = browser.new_context()
    page = context.new_page()
    page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
    page.pdf(path='Test.pdf', format='A4')
test()

For example, in the code snippet below, the pdf is generated empty, blank (the screenshots are black with no content). I noticed that this error happened because of the speed, so I used the sleep library, as wait_for_load_state() does not work in this case. However, the new generated pdfs come out with the wrong formatting (here is an attached image showing a print, I hid the content, but the layout is the same withou the black)

enter image description here

My theory is that the page generates like this because of the chromium pdf viewer summary. So, I tried to disable it in this code:

  def test():

  from playwright.sync_api import sync_playwright
  from time import sleep

  with sync_playwright() as p:
    # browser = p.chromium.launch(headless=False)
    browser = p.chromium.launch_persistent_context(user_data_dir=r'C:\Users\pedro\AppData\Local\Temp\playwright_chromiumdev_profile-AidV4Q\Default', args=['--print-to-pdf', '--disable-extensions', '--print-to-pdf-no-header'], headless=False)
    page = browser.new_page()
    page.goto('https://nyc3.digitaloceanspaces.com/midia/wp-content/uploads/2023/01/4tmrioel-sample.pdf')
    sleep(5)
    page.pdf(path='test.pdf', format='A4')
    input()

Still, I couldn't solve the problem. Details: Unable to run these codes in Headless mode, chromium appears to be being automated (making detection easier). So, does anyone have a solution to my problem?

You can't manage because it is not allowed in headless mode. As you can read in official docs: https://playwright.dev/python/docs/api/class-page#page-goto

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM