简体   繁体   English

裁剪 .pdf 文件的页面

[英]Cropping pages of a .pdf file

I was wondering if anyone had any experience in working programmatically with .pdf files.我想知道是否有人有任何以编程方式处理 .pdf 文件的经验。 I have a .pdf file and I need to crop every page down to a certain size.我有一个 .pdf 文件,我需要将每一页裁剪到特定大小。

After a quick Google search I found the pyPdf library for python but my experiments with it failed.在谷歌快速搜索后,我找到了 python 的 pyPdf 库,但我的实验失败了。 When I changed the cropBox and trimBox attributes on a page object the results were not what I had expected and appeared to be quite random.当我更改页面对象上的cropBox 和trimBox 属性时,结果不是我所期望的,而且似乎很随机。

Has anyone had any experience with this?有没有人有这方面的经验? Code examples would be well appreciated, preferably in python.代码示例将不胜感激,最好是在 python 中。

pyPdf does what I expect in this area. pyPdf在这方面做了我所期望的。 Using the following script:使用以下脚本:

#!/usr/bin/python
#

from pyPdf import PdfFileWriter, PdfFileReader

with open("in.pdf", "rb") as in_f:
    input1 = PdfFileReader(in_f)
    output = PdfFileWriter()

    numPages = input1.getNumPages()
    print "document has %s pages." % numPages

    for i in range(numPages):
        page = input1.getPage(i)
        print page.mediaBox.getUpperRight_x(), page.mediaBox.getUpperRight_y()
        page.trimBox.lowerLeft = (25, 25)
        page.trimBox.upperRight = (225, 225)
        page.cropBox.lowerLeft = (50, 50)
        page.cropBox.upperRight = (200, 200)
        output.addPage(page)

    with open("out.pdf", "wb") as out_f:
        output.write(out_f)

The resulting document has a trim box that is 200x200 points and starts at 25,25 points inside the media box.生成的文档有一个 200x200 点的裁切框,从媒体框内的 25,25 点开始。 The crop box is 25 points inside the trim box.裁剪框位于修剪框内的 25 点处。

Here is how my sample document looks in acrobat professional after processing with the above code:以下是我的示例文档在使用上述代码处理后在 acrobat Professional 中的外观: 裁剪页面截图

This document will appear blank when loaded in acrobat reader.该文档在 Acrobat Reader 中加载时将显示为空白。

Use this to get the dimension of pdf使用它来获取pdf的尺寸

from PyPDF2 import PdfFileWriter,PdfFileReader,PdfFileMerger

pdf_file = PdfFileReader(open("/Users/user.name/Downloads/sample.pdf","rb"))
page = pdf_file.getPage(0)
print(page.cropBox.getLowerLeft())
print(page.cropBox.getLowerRight())
print(page.cropBox.getUpperLeft())
print(page.cropBox.getUpperRight())

After this get page reference and then apply crop command在此之后获取页面参考,然后应用裁剪命令

page.mediaBox.lowerRight = (lower_right_new_x_coordinate, lower_right_new_y_coordinate)
page.mediaBox.lowerLeft = (lower_left_new_x_coordinate, lower_left_new_y_coordinate)
page.mediaBox.upperRight = (upper_right_new_x_coordinate, upper_right_new_y_coordinate)
page.mediaBox.upperLeft = (upper_left_new_x_coordinate, upper_left_new_y_coordinate)

#for example :- my custom coordinates 
#page.mediaBox.lowerRight = (611, 500)
#page.mediaBox.lowerLeft = (0, 500)
#page.mediaBox.upperRight = (611, 700)
#page.mediaBox.upperLeft = (0, 700)

You are probably looking for a free solution, but if you have money to spend, PDFlib is a fabulous library.您可能正在寻找免费的解决方案,但如果您有钱花, PDFlib是一个很棒的库。 It has never disappointed me.它从来没有让我失望过。

How do I know the coordinates to crop?我怎么知道要裁剪的坐标?

Thanks for all answers above.感谢上面的所有答案。

Step 1. Run the following code to get (x1, y1). Step 1. 运行以下代码得到(x1, y1)。

from PyPDF2 import PdfFileWriter, PdfFileReader

input = PdfFileReader(open("test.pdf","rb"))
page = input.getPage(0)
print(page.cropBox.getUpperRight())

Step 2. View the pdf file in full screen mode.步骤 2. 以全屏模式查看 pdf 文件。

Step 3. Capture the screen as an image file screen.jpg.步骤 3. 将屏幕捕获为图像文件 screen.jpg。

Step 4. Open screen.jpg by M$ paint or GIMP.步骤 4. 用 M$paint 或 GIMP 打开 screen.jpg。 These applications show the coordinate of the cursor.这些应用程序显示光标的坐标。

Step 5. Remember the following coordinates, (x2, y2), (x3, y3), (x4, y4) and (x5, y5), where (x4, y4) and (x5, y5) determine the rectangle you want to crop. Step 5. 记住下面的坐标,(x2, y2), (x3, y3), (x4, y4) 和 (x5, y5),其中 (x4, y4) 和 (x5, y5) 确定你想要的矩形作物。

在此处输入图片说明

Step 6. Get page.cropBox.upperLeft and page.cropBox.lowerRight by the following formulas.步骤 6. 通过以下公式获取 page.cropBox.upperLeft 和 page.cropBox.lowerRight。 Here is a tool for calculating.这里有一个计算工具

page.cropBox.upperLeft = (x1*(x4-x2)/(x3-x2),(1-y4/y3)*y1)
page.cropBox.lowerRight = (x1*(x5-x2)/(x3-x2),(1-y5/y3)*y1)

Step 7. Run the following code to crop the pdf file.步骤 7. 运行以下代码来裁剪 pdf 文件。

from PyPDF2 import PdfFileWriter, PdfFileReader

output = PdfFileWriter() 
input = PdfFileReader(open('test.pdf', 'rb')) 

n = input.getNumPages()

for i in range(n):
  page = input.getPage(i)
  page.cropBox.upperLeft = (100,200)
  page.cropBox.lowerRight = (300,400)
  output.addPage(page) 
  
outputStream = open('result.pdf','wb') 
output.write(outputStream) 
outputStream.close() 

You can convert the PDF to Postscript (pstopdf or ps2pdf) and than use text processing on the Postscript file.您可以将 PDF 转换为 Postscript(pstoppdf 或 ps2pdf),然后对 Postscript 文件进行文本处理。 After that you can convert the output back to PDF.之后,您可以将输出转换回 PDF。

This works nicely if the PDFs you want to process are all generated by the same application and are somewhat similar.如果您要处理的 PDF 都是由同一个应用程序生成的并且有些相似,那么这会很好地工作。 If they come from different sources it is usually to hard to process the Postscript files - the structure is varying to much.如果它们来自不同的来源,通常很难处理 Postscript 文件 - 结构变化很大。 But even than you migt be able to fix page sizes and the like with a few regular expressions.但是,即使您能够使用一些正则表达式来修复页面大小等。

Acrobat Javascript API has a setPageBoxes method, but Adobe doesn't provide any Python code samples. Acrobat Javascript API 有一个 setPageBoxes 方法,但 Adob​​e 不提供任何 Python 代码示例。 Only C++, C# and VB.只有 C++、C# 和 VB。

Cropping pages of a .pdf file裁剪 .pdf 文件的页面

from PIL import Image
def ImageCrop():
    img = Image.open("page_1.jpg")
    left = 90
    top = 580
    right = 1600
    bottom = 2000
    img_res = img.crop((left, top, right, bottom))
    with open(outfile4, 'w') as f:
        img_res.save(outfile4,'JPEG')
ImageCrop()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM