简体繁体中英

Resolving page numbers from PyPDF2 getOutlines()

原文 2019-09-12 14:37:21 0 1 python/ pdf/ pypdf2

I'm using PyPDF 2 to process some PDF files. I'm hoping to extract outline/ToC data from files that have it, essentially to try to get a sense of which section of the document a given page corresponds to.

According to the docs , PdfFileReader 's getOutlines method should return a nested list of Destination objects. Then, according to the docs , each of these should have a page (int) attribute.

Unfortunately, this isn't the case with the files I've tried. Instead, I get indirectObject s, which resolve to PyPDF2.generic.DictionaryObject s. I can't figure out how to get the Destination objects I'm expecting, or how to extract meaningful page numbers from the indirectObject s I'm getting instead.

The ultimate goal is to, given an outline's page number, be able to pass that page number to getPage() and then call extractText() .

Any guidance much appreciated. Thank you!

1 answers

PyPDF2.PdfFileReader has a getDestinationPageNumber method that gives you the page number from a Destination object.

However, PyPDF2 is not really updated anymore and outline iteration is broken on Python 3.7. Instead, you might want to try pikepdf , they also have outline support .

PyPDF2 append a PDF from the 2nd page

How can I rotate a page in pyPDF2?

Convert PDF page to image with pyPDF2 and BytesIO

PyPDF2 writer function creates blank page

Page number of added bookmarks with PyPDF2

Select only first page of PDF pypdf2

Page count after using PdfFileMerger() in pypdf2

PyPDF2 merging issue from file list

Issue extracting text from pdf with PyPDF2

PyPDF2: Reading a pdf from a zipfile

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question PyPDF2 append a PDF from the 2nd page How can I rotate a page in pyPDF2? Convert PDF page to image with pyPDF2 and BytesIO PyPDF2 writer function creates blank page Page number of added bookmarks with PyPDF2 Select only first page of PDF pypdf2 Page count after using PdfFileMerger() in pypdf2 PyPDF2 merging issue from file list Issue extracting text from pdf with PyPDF2 PyPDF2: Reading a pdf from a zipfile

Related Tags

Resolving page numbers from PyPDF2 getOutlines()

Question

1 answers

solution1 0 2020-06-18 10:14:02

solution1
0 2020-06-18 10:14:02