简体   繁体   中英

python-pptx: Dealing with password-protected PowerPoint files

I'm using a slightly modified version of the "Extract all text from slides in presentation" example at https://python-pptx.readthedocs.io/en/latest/user/quickstart.html to extract text from some PowerPoint slides.

I'm getting a PackageNotFoundError when I try to use the Presentation() method to open some of the PowerPoint files to read the text.

This appears to be due to the fact that, unbeknownst to me before I started this project, a few of the PowerPoint files are password protected.

I obviously don't expect to be able to read text from a password-protected file but is there a recommended way of dealing with password-protected PowerPoint files? Having my Python script die every time it runs into one is annoying.

I'd be fine with something that basically went: "Hi. The file you're trying to read may be password-protected. Skipping."

I tried using a try/except block to catch the PackageNotFoundError but then I got "NameError: name 'PackageNotFoundError' is not defined".

EDIT1: Here's a minimal case the generates the error:

EDIT2: See below for a working try/catch block, thanks to TheGamer007's suggestion.

import pptx
from pptx import Presentation

password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"

prs = Presentation(password_protected_file)

And here's the error that is generated:

Traceback (most recent call last):
  File "T:/W/Wintermute/50 Sandbox/Pownall/Python/copy files/minimal_case_opening_file.py", line 6, in <module>
    prs = Presentation(password_protected_file)
  File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\api.py", line 28, in Presentation
    presentation_part = Package.open(pptx).main_document_part
  File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\package.py", line 125, in open
    pkg_reader = PackageReader.from_file(pkg_file)
  File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\pkgreader.py", line 33, in from_file
    phys_reader = PhysPkgReader(pkg_file)
  File "C:\Anaconda3\lib\site-packages\python_pptx-0.6.18-py3.6.egg\pptx\opc\phys_pkg.py", line 32, in __new__
    raise PackageNotFoundError("Package not found at '%s'" % pkg_file)
pptx.exc.PackageNotFoundError: Package not found at 'C:\Users\J69401\Documents\password_protected_file.pptx'

Here's the minimal case again but with a working try/catch block.

import pptx
from pptx import Presentation
import pptx.exc
from pptx.exc import PackageNotFoundError

password_protected_file = r"C:\Users\J69401\Documents\password_protected_file.pptx"

try:
    prs = Presentation(password_protected_file)
except PackageNotFoundError:
    print("PackageNotFoundError generated - possible password-protected file.")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM