简体   繁体   中英

is there a way to measure margins of a pdf using python?

I've been using different python packages to parse PDFs, but I'm wondering if it's possible to measure the margins of a particular line in the document. The measurement I would like is for it to be in pixels css-style, if possible.

It doesn't need to be so specific, just to figure out if a line is left-aligned, centered, or right-aligned based on margins, starting from left-to-right.

Example:

# margin <= x
left-aligned

# margin >= y && margin <= z
                            center-aligened

# margin >= z
                                                              right-aligned

Obviously this is just an example, but the margin differential will not be large, meaning, PDFs I'm parsing will likely have (in css terms):

  • margin-left: 0
  • margin-left: x
  • margin-left: y

x, y actual value are unimportant, the important thing is that they'll be consistent.

Sorry if this is confusing, the main thing I'm asking for is clarification or help in figuring out left-margin for every line in a pdf.

disclaimer: I am the author of borb , the library used in this answer

You can SimpleLineOfTextExtraction in borb , which returns the lines of text in a PDF.

You can check out this class here .

Each line has a content box (and a layout box), which can give you information about the location of that particular line of text.

You can use this to determine whether a line is left/right/middle aligned by comparing it to lines above/below it.

You can find an example of how to use this class here .

Essentially you open a document using the PDF.loads method, passing along an EventListener .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM