简体   繁体   English

如何使用Java中的PDFBOX裁剪pdf中的每一页?

[英]How do I crop each page in pdf using PDFBOX in Java?

I want to remove the bottom part of each page in the PDF, but not change page size, what is the recommended way to do this in java in PDFBOX? 我想删除PDF中每个页面的底部,但不更改页面大小,在PDFBOX中使用Java的推荐方法是什么? How to remove the footer from each page in PDF? 如何从PDF的每个页面中删除页脚?

Is there possibly a way to use PDRectangle to just delete all text/images within it? 是否有可能使用PDRectangle删除其中的所有文本/图像的方法?

snippet of what I tried, using rectangle with setCropBox seems to lose page size, maybe cropBox is not intended for this? 我尝试过的代码片段,将矩形与setCropBox一起使用似乎会丢失页面大小,可能不是针对此目的roppBox?

            PDRectangle rectangle = new PDRectangle();
            rectangle.setUpperRightY(mypage.findCropBox().getUpperRightY());
            rectangle.setLowerLeftY(50);
            rectangle.setUpperRightX(mypage.findCropBox().getUpperRightX());
            rectangle.setLowerLeftX(mypage.findCropBox().getLowerLeftX());                  
            mypage.setCropBox(rectangle);
            croppedDoc.addPage(mypage);
            croppedDoc.save(filename);              
            croppedDoc.close();

Closest example in pdfbox cookbook examples I could find is on how to remove entire page, however this is not what I'm looking for, I'd like to just delete few elements from the page: http://pdfbox.apache.org/userguide/cookbook.html 我在pdfbox食谱示例中找到的最接近的示例是如何删除整个页面,但这不是我想要的,我只想从页面中删除一些元素: http : //pdfbox.apache.org /userguide/cookbook.html

I'm also a newbie, but take a look at this page , in particular, the description of TrimBox. 我也是新手,但请看此页面 ,特别是TrimBox的描述。 If there's no TrimBox on the page, it defaults to CropBox, which would cause what you're seeing. 如果页面上没有TrimBox,则默认为CropBox,这将导致您看到的内容。

In general, don't expect the PDFBox docs to tell you much of anything about PDF itself - to use PDFBox well I think you need to go elsewhere - AFAIK, mostly just to the PDF specification. 通常,不要指望PDFBox文档能告诉您有关PDF本身的任何信息-要很好地使用PDFBox,我想您需要走到其他地方-AFAIK,主要是针对PDF规范。 I haven't even skimmed it yet, though! 我什至还没有浏览过它!

The CropBox is the way to go if you want to remove a portion of a page while keeping a rectangular region visible. 如果要在保留矩形区域可见的情况下删除页面的一部分,则可以使用CropBox。 If you want the page size to remain the same, you need the MediaBox to remain the same. 如果希望页面大小保持不变,则需要MediaBox保持不变。

From the PDF Spec: 从PDF规范:

CropBox - rectangle (Optional; inheritable) A rectangle, expressed in default user space units, defining the visible region of default user space. CropBox-矩形 (可选;可继承)一个矩形,以默认用户空间单位表示,定义了默认用户空间的可见区域。 When the page is displayed or printed, its contents are to be clipped (cropped) to this rectangle and then imposed on the output medium in some implementation-defined manner (see Section 10.10.1, “Page Boundaries”). 当页面被显示或打印时,其内容将被剪切(裁剪)到该矩形,然后以某种实现定义的方式施加到输出介质上(请参见第10.10.1节“页面边界”)。 Default value: the value of MediaBox. 默认值:MediaBox的值。

MediaBox - rectangle (Required; inheritable) A rectangle (see Section 3.8.4, “Rectangles”), expressed in default user space units, defining the boundaries of the physical medium on which the page is intended to be displayed or printed (see Section 10.10.1, “Page Boundaries”). MediaBox-矩形 (必填;可继承)一个矩形(请参见第3.8.4节“矩形”),以默认用户空间单位表示,定义了要在其上显示或打印页面的物理介质的边界(请参见第5.3节,“矩形”)。 10.10.1,“页面边界”)。

A have seen (faulty) applications and libraries that force the CropBox and the MediaBox to be the same, double check that this is not what is happening on your case. 已经看到(错误的)应用程序和库迫使CropBox和MediaBox相同,请仔细检查这不是您的情况。

Also take into account that the coordinates origin (0,0) in PDF is the bottom-left corner, some libraries do the translation to top-left for you, some others not, you may also want to double check this on the library you are using. 还要考虑到PDF中的坐标原点(0,0)是左下角,有些库为您完成了左上角的翻译,有些则没有,您可能还想在库中再次检查一下正在使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM