简体   繁体   中英

Does PyPDF2 work with PDFs in landscape mode?

I was a writing a program that should extract text from a PDF using PyPDF2, and when I ran the program, it worked with a document that was written in portrait mode and printed out text. However the second document was written in landscape mode, and when it was run through the program, it did not print out any text. Below is what my code currently looks like.

text = ""
pdf = PdfFileReader('TEST.pdf', 'rb')
for i in range(pdf.getNumPages())
    text += pdf.getPage(i).extractText()
print(text)

What I'm wondering is basically can PyPDF2 read documents if they are in landscape mode, or does their orientation matter when extracting text? For additional details on the documents, the font used in the successful document was written with "Grotesque Sans Serif" font (ie Helvetica), and the unsuccessful document was written in "Slab Serifs" font (ie Rockwell).

Below are what the PDFs look like. The first was the successful document, the second is the unsuccessful document: 在此处输入图像描述 在此处输入图像描述

It appears the reason the second document is not working is because the second document is PDF version 1.4, and PyPDF2 does not work with that version. The document that DOES work with the program is PDF version 1.5. To anyone besides me who comes across this issue, I would recommend using an OCR instead of PyPDF2 if your PDF is version 1.4 or earlier. If it's 1.5 or later, PyPDF2 should work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM