简体   繁体   中英

How to read this pdf form using PyPDF2 in python

https://www.fda.gov/downloads/AboutFDA/ReportsManualsForms/Forms/UCM074728.pdf

I'm trying to read this pdf using PyPDF2 or Pdfminer, but it is saying that the File has not been decrypted in Pypdf2 and in pdfminer, it is saying that it can decompress that pdf. Somebody let me know how to do this in a python3 windows environment. I can't use poppler as I cant install poppler in this windows.

This is a restricted PDF file. In most cases you can decrypt a file that doesn't prompt you for a password using PyPDF2 with an empty string:

from PyPDF2 import PdfFileReader

reader = PdfFileReader('sample.pdf')
reader.decrypt('')

Unfortunately, it's not the case of your file or any other with 128-bit AES encryption level which is unsupported for the PyPDF2 decrypt() method that will return a NotImplementedError .

As a simple workaround you can save this file as a new file in Adobe Reader or similar and the new file should work for your code.

Also, you can do it programmatically using qpdf as discussed in this GitHub issue :

import os, shutil, tempdir
from subprocess import check_call

    try:
        tempdir = tempfile.mkdtemp(dir=os.path.dirname(filename))
        temp_out = os.path.join(tempdir, 'qpdf_out.pdf')
        check_call(['qpdf', "--password=", '--decrypt', filename, temp_out])
        shutil.move(temp_out, filename)
        print 'File Decrypted'

    finally:
        shutil.rmtree(tempdir)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM