简体   繁体   English

如何在pdf中获取页面的特定部分并将其保存到python中的新pdf?

[英]How do I get a specific part of a page in a pdf and save it to a new pdf in python?

I have very little experience in manipulating pdfs using python, and my experience is restricted only to reading using 'pdfreader' a python library.我在使用 python 处理 pdf 方面经验很少,我的经验仅限于使用“pdfreader”python 库进行阅读。 I have a pdf, (which in this case is a past exam paper), I want it to split a page when it encounters a question number, let's say 12 for this example (it would be formatted "12."), and save the split part containing the number 12. in a new pdf. How do I do this?我有一个 pdf,(在这种情况下是过去的试卷),我希望它在遇到问题编号时拆分页面,假设这个例子是 12(格式为“12.”),然后保存在新的 pdf 中包含数字 12 的拆分部分。我该怎么做?

I'm not a very good programmer so sorry if my question is stupid, but searching on the inte.net I could not find how to do this.我不是一个很好的程序员,如果我的问题很愚蠢,我很抱歉,但是在 inte.net 上搜索我找不到如何做到这一点。

The solution at the end was to transform the pdf page into an image, crop it where I want it, then back to a pdf. To get the coordinates I had to use pdf miner, to then get the pixels to modify the image I had to make a proportion between the height of the page in pdf coordinates and the height of the image I wanted to create in pixels, so then I could transform the coordinates of one into the coordinates of the other.最后的解决方案是将 pdf 页面转换为图像,将其裁剪到我想要的位置,然后返回到 pdf。要获取坐标,我必须使用 pdf 矿工,然后获取像素来修改我的图像在 pdf 坐标中的页面高度与我想以像素为单位创建的图像的高度之间建立一个比例,这样我就可以将一个坐标转换为另一个坐标。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM