简体   繁体   中英

Get dimensions and coordinates of textfields in PDF

Is it possible to get the X/Y coordinates and height/width of all textfields in a PDF document using PHP or linux library? I am using PDFTK to extract all textfields in the PDF but it doesn't give me coordinate and/or dimension information. If not, is it possible to traverse the PDF doc and calculate the x,y and height/width data for the text fields?

It's possible, but hardly doable.

You can open PDF documents in PHP using FPDI . It generates an abstract tree of PDF objects in memory. TCPDF and FPDF can save it back.

However traversing said tree and finding the correct attributes is very. (I accidently the verb.)

Now the PDF format is actually human-readable. And it would certainly contain the coordinates in a readable format (it's mostly in points IIRC). So you might possibly discover it with a simple regex, if you only knew where to look. Some nodes just need to be gzuncompress()ed, and you are not attempting to modify the document or save it back anyway. So, try FPDI and print_r() to devise a strategy.

yeah, it's not too hard. the best tool i know for the job is pdfminer . it's python, but if you don't want to use python, you can just dump the pdf info in xml format, and parse that with your weapon of choice :) reply if you have trouble :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM