简体   繁体   中英

Determine Text Orientation in a PDF

Is there a way that I can detect whether the text in a page is in Landscape or Portrait Orientation using JS or any libraries? I cannot rely on width > height, as there are pages that are rotated as well. 纵向旋转页面 vs横向旋转页面

I cannot rely on comparing Width and Height, or checking if the page is rotated, because both these pages are rotated 90 degrees, but I cannot figure out how to detect the text's orientation.

I also do some preprocessing on the PDF using Node.js and pdfjs. So if that has any API/library to help me get the required information I would appreciate the help.

You can do this using tesseract which is mainly used for OCR conversion. I am using it with PHP but you can also use it with JS: https://ourcodeworld.com/articles/read/580/how-to-convert-images-to-text-with-pure-javascript-using-tesseract-js

Tesseract can detect orientation. Here is some information on it using Python: Is it possible to check orientation of an image before passing it through pytesseract ocr module

All you would need to do is to adapt this to Javascript using the tool of the first link above.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM