简体   繁体   English

实际使用 PDF Clown 裁剪 PDF

[英]Actually cropping a PDF with PDF Clown

My objective is actually cropping a PDF file with PdfClown.我的目标实际上是用 PdfClown 裁剪一个 PDF 文件。 There are a lot of tools/library that allow cropping PDF, changing the PDF cropBox.有很多工具/库允许裁剪 PDF,更改 PDF 裁剪框。 This permits hiding contents outside a rectangular area, but content is still there, it might be accessed through a PDF parser and PDF size does not change.这允许隐藏矩形区域之外的内容,但内容仍然存在,它可能通过 PDF 解析器访问并且 PDF 大小不会改变。

On the contrary what I need is creating a new page containing only the contents inside the rectangular area.相反,我需要的是创建一个仅包含矩形区域内内容的新页面。

So far I've tried scanning contents and selectively cloning them.到目前为止,我已经尝试扫描内容并有选择地克隆它们。 But I didn't succeed yet.但我还没有成功。 Any suggestions on using PdfClown for that?关于使用 PdfClown 的任何建议?

I've seen someone is trying something similar with PdfBox Cropping a region from a PDF page with PDFBox not succeeding yet.我见过有人尝试使用 PdfBox 从 PDF 页面裁剪区域,但尚未成功。

A bit late, but maybe it helps someone;有点晚了,但也许对某人有帮助; I am sucessfully doing what you are asking for - but with other libraries.我正在成功地做你所要求的 - 但与其他图书馆。 Required libraries : iText 4 or 5 and Ghostscript所需的库:iText 4 或 5 和 Ghostscript

Step 1 with pseudo code步骤 1 使用伪代码

Using iText, Create a PDFWRITER instance with a blank Doc.使用 iText,创建一个带有空白文档的 PDFWRITER 实例。 Open a PDFREADER object to the original file you want to crop.打开 PDFREADER 对象到要裁剪的原始文件。 Import the Page, get a PDFTemplate Object from the source, set its .boundingBox property to the desired cropbox, wrap the template into an iText Image object and paste it onto the new page at an absolute position.导入页面,从源获取 PDFTemplate 对象,将其.boundingBox属性设置为所需的裁剪框,将模板包装到 iText Image 对象中,并将其粘贴到新页面的绝对位置。

Dim reader As New PdfReader(sourcefile)
Dim doc As New Document()
Dim writer As PdfWriter = PdfWriter.GetInstance(doc, New System.IO.FileStream(outputfilename, System.IO.FileMode.Create))

//get the source page as an Imported Page
Dim page As PdfImportedPage = writer.GetImportedPage(reader, indexOfPageToGet) page

//create PDFTemplate Object at original size from source - see iText in Action book Page 91 for full details
Dim pdftemp As PdfTemplate = page.CreateTemplate(page.Width, page.Height) 
//paste the original page onto the template object, see iText documentation what those parameters do (scaling, mirroring)
pdftemp.AddTemplate(page, 1, 0, 0, 1, 0, 0)
//now the critical part - set .boundingBox property on the template. This makes all objects outside the rectangle invisible
pdftemp.boundingBox = {iText Rectangle Structure with new Cropbox}
//template not needed anymore
writer.ReleaseTemplate(pdftemp) 
//create an iText IMAGE object as wrapper to the template - with this img object absolute positionion on the final page is much easier
dim img as iTextSharp.Text.Image = Image.GetInstance(pdftemp)
// set img position
img.SetAbsolutePosition(x, y)
//set optional Rotation if needed
img.RotationDegrees = 0
//finally, this adds the actual content to the new document
doc.Add(img) 
//cleanup
doc.Close()
reader.Close()
writer.Close()

The output file will visually look cropped.输出文件将在视觉上看起来被裁剪。 But the objects are still present in the PDF Stream.但是对象仍然存在于 PDF 流中。 Filesize will probably remain very little changed yet.文件大小可能会保持很小的变化。

Step 2:第 2 步:

Using Ghostscript and output device pdfwrite, combined with the correct command line parameters you can re-process the PDF from Step 1. This will give you a much smaller PDF.使用 Ghostscript 和输出设备 pdfwrite,结合正确的命令行参数,您可以重新处理步骤 1 中的 PDF。这将为您提供更小的 PDF。 See Ghostscript documentation for the arguments https://www.ghostscript.com/doc/9.52/Use.htm This steps actually gets rid of objects that are outside the bounding box - the requirement you asked for in your OP, at least for files that I deal with.有关参数,请参阅 Ghostscript 文档https://www.ghostscript.com/doc/9.52/Use.htm此步骤实际上消除了边界框之外的对象 - 您在 OP 中要求的要求,至少对于文件我处理的。

Optional Step 3: Using MUTOOL with the -g option you can clean up unused XREF objects.可选步骤 3:使用带有 -g 选项的 MUTOOL,您可以清理未使用的外部参照对象。 Your original PDF probably had a lot of Xrefs, which increase filesize.您的原始 PDF 可能有很多外部参照,这会增加文件大小。 After cropping some of them may not be needed anymore.裁剪后,其中一些可能不再需要了。 https://mupdf.com/docs/manual-mutool-clean.html https://mupdf.com/docs/manual-mutool-clean.html

PDF Format is a tricky thing, normally I would agree with @Tilman Hausherr , my suggestion may not work for all files and covers the 'almost impossible' scenario, but it works for all cases that I deal with. PDF 格式是一件棘手的事情,通常我会同意@Tilman Hausherr ,我的建议可能不适用于所有文件并涵盖“几乎不可能”的情况,但它适用于我处理的所有情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM