简体   繁体   English

有没有办法使用 python 测量 pdf 的边距?

[英]is there a way to measure margins of a pdf using python?

I've been using different python packages to parse PDFs, but I'm wondering if it's possible to measure the margins of a particular line in the document.我一直在使用不同的 python 包来解析 PDF,但我想知道是否可以测量文档中特定行的边距。 The measurement I would like is for it to be in pixels css-style, if possible.如果可能的话,我想要的测量值是像素 css 样式。

It doesn't need to be so specific, just to figure out if a line is left-aligned, centered, or right-aligned based on margins, starting from left-to-right.它不需要那么具体,只是根据边距从左到右判断一条线是左对齐、居中还是右对齐。

Example:例子:

# margin <= x
left-aligned

# margin >= y && margin <= z
                            center-aligened

# margin >= z
                                                              right-aligned

Obviously this is just an example, but the margin differential will not be large, meaning, PDFs I'm parsing will likely have (in css terms):显然这只是一个例子,但边距差异不会很大,这意味着,我正在解析的 PDF 可能会有(以 css 的形式):

  • margin-left: 0
  • margin-left: x
  • margin-left: y

x, y actual value are unimportant, the important thing is that they'll be consistent. x, y的实际值并不重要,重要的是它们是一致的。

Sorry if this is confusing, the main thing I'm asking for is clarification or help in figuring out left-margin for every line in a pdf.抱歉,如果这令人困惑,我主要要求的是澄清或帮助计算 pdf 中每一行的左边距。

disclaimer: I am the author of borb , the library used in this answer免责声明:我是borb的作者,这个答案中使用的库

You can SimpleLineOfTextExtraction in borb , which returns the lines of text in a PDF.您可以在borb中使用SimpleLineOfTextExtraction ,它返回 PDF 中的文本行。

You can check out this class here .您可以在此处查看此 class。

Each line has a content box (and a layout box), which can give you information about the location of that particular line of text.每行都有一个内容框(和一个布局框),它可以为您提供有关该特定文本行位置的信息。

You can use this to determine whether a line is left/right/middle aligned by comparing it to lines above/below it.您可以使用它来确定一条线是否左/右/中对齐,方法是将它与其上方/下方的线进行比较。

You can find an example of how to use this class here .您可以在此处找到有关如何使用此 class 的示例。

Essentially you open a document using the PDF.loads method, passing along an EventListener .本质上,您使用PDF.loads方法打开一个文档,并传递一个EventListener

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM