简体   繁体   English

在 Python 中编辑几行未压缩的 PDF

[英]Edit a few lines of uncompressed PDF in Python

I want to edit a few lines in an uncompressed pdf.我想在未压缩的 pdf 中编辑几行。 I found a similar problem but since I need to scan the file a few times to get the exact line positions I want to change this doesn't really suit (and the pure number of RegEx matches are more than desired).我发现了一个类似的问题,但由于我需要扫描文件几次以获得我想要更改的确切行位置,这并不适合(并且 RegEx 匹配的纯数量超出了预期)。 The pdf contains utf-8 encodable lines (a few of them I want to edit, bookmark target ids in particular) and a lot of blobs (guess images and so on). pdf 包含 utf-8 可编码行(其中一些我想编辑,特别是书签目标 ID)和很多 blob(猜测图像等)。 When I edit the file with notepad it's working fine, but when I do it programatically (reading in, changing a few lines, writing back) images and some formatting is missing.当我用记事本编辑文件时,它工作正常,但是当我以编程方式(读入、更改几行、写回)时,图像和一些格式丢失了。 (Sine they are not read in at the firstplace, ignore-option) (因为他们一开始没有被读入,忽略选项)

with codecs.open("merged-uncompressed.pdf", "r", encoding='ascii', errors='ignore') as f:

I can read the file in with errors="surrogateescape" and wanted to map the lines from above import but don't know if this approach can work.我可以使用errors="surrogateescape"读取文件,并希望map 导入上面的行,但不知道这种方法是否可行。

Does anyone know a way how to deal with this?有谁知道如何处理这个问题?

Best, Lukas最好的,卢卡斯

I was able to solve this:我能够解决这个问题:

  1. read the file as binary以二进制形式读取文件
  2. marked the lines which couldn't be encoded utf-8标记了无法编码的行 utf-8
  3. copied the list line by line to a temporary list ( not encodable lines were copied with a placholder 'None\n')将列表逐行复制到临时列表(不可编码的行是用占位符'None\n'复制的)
  4. Then I went back to do the searching part on the copied list so I got my lines I wanted to replace然后我回去在复制的列表上做搜索部分,所以我得到了我想要替换的行
  5. replaced the lines in the original binary list (same indices!)替换了原始二进制列表中的行(相同的索引!)
  6. wrote it back to file写回文件
  7. the resulting pdf was a bit corupted because of whitespace before the target ids of the bookmarks but by recompressing qpdf fixed it:)生成的 pdf 有点损坏,因为书签的目标 id 之前有空格,但通过重新压缩 qpdf 修复了它:)

The code is very messy at the moment and so I don't want to publish it right now.目前代码非常混乱,所以我现在不想发布它。 But I want to add it at github within the next few weeks.但我想在接下来的几周内将它添加到 github。 If anyone needs it: just comment and it will have more priority.如果有人需要它:只需发表评论,它将具有更高的优先级。

Thanks to anyone who wanted to help:) Lukas感谢任何想提供帮助的人:) Lukas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM