简体   繁体   中英

How to detect a rotated page in a PDF document in Python?

Given a PDF document with multiple pages, how to check if a given page is rotated (-90, 90 or 180º)? Preferable using Python (pdfminer, pyPDF) ...

UPDATE: The pages are scanned, and most of the page is composed by text.

I used simply /Rotate attribute of the page in PyPDF2 :

 pdf = PyPDF2.PdfFileReader(open('example.pdf', 'rb'))
 orientation = pdf.getPage(pagenumber).get('/Rotate')

it can be 0 , 90 , 180 , 270 or None

If you're using pdfminer you can get the rotation by calling the .rotate attribute of PDFPage instance.

for page in PDFPage.create_pages(doc):
    interpreter.process_page(page)
    r = page.rotate

If you're using PDFMiner and want the orientation by each page:

from pdfminer.pdfpage import PDFPage
from io import StringIO
from pdfminer.pdfparser import PDFParser
from pdfminer.pdfdocument import PDFDocument
from pdfminer.pdfpage import PDFPage
from pdfminer.pdfinterp import PDFResourceManager, PDFPageInterpreter
from pdfminer.converter import TextConverter
from pdfminer.layout import LAParams

output_string = StringIO()
resource_manager = PDFResourceManager()
device = TextConverter(resource_manager, output_string, 
laparams=LAParams())
interpreter = PDFPageInterpreter(resource_manager, device)

for page in PDFPage.get_pages(open('sample.pdf', 'rb')):
    interpreter.process_page(page)

    if page.mediabox[2] - page.mediabox[0] > page.mediabox[3] - page.mediabox[1]:
        orientation = 'Landscape'
    else:
        orientation = 'Portrait'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM