在 Python 中编辑几行未压缩的 PDF

Question

I want to edit a few lines in an uncompressed pdf.我想在未压缩的 pdf 中编辑几行。 I found a similar problem but since I need to scan the file a few times to get the exact line positions I want to change this doesn't really suit (and the pure number of RegEx matches are more than desired).我发现了一个类似的问题，但由于我需要扫描文件几次以获得我想要更改的确切行位置，这并不适合（并且 RegEx 匹配的纯数量超出了预期）。 The pdf contains utf-8 encodable lines (a few of them I want to edit, bookmark target ids in particular) and a lot of blobs (guess images and so on). pdf 包含 utf-8 可编码行（其中一些我想编辑，特别是书签目标 ID）和很多 blob（猜测图像等）。 When I edit the file with notepad it's working fine, but when I do it programatically (reading in, changing a few lines, writing back) images and some formatting is missing.当我用记事本编辑文件时，它工作正常，但是当我以编程方式（读入、更改几行、写回）时，图像和一些格式丢失了。 (Sine they are not read in at the firstplace, ignore-option) （因为他们一开始没有被读入，忽略选项）

with codecs.open("merged-uncompressed.pdf", "r", encoding='ascii', errors='ignore') as f:

I can read the file in with errors="surrogateescape" and wanted to map the lines from above import but don't know if this approach can work.我可以使用errors="surrogateescape"读取文件，并希望map 导入上面的行，但不知道这种方法是否可行。

Does anyone know a way how to deal with this?有谁知道如何处理这个问题？

Best, Lukas最好的，卢卡斯

Answer 1

I was able to solve this:我能够解决这个问题：

read the file as binary以二进制形式读取文件
marked the lines which couldn't be encoded utf-8标记了无法编码的行 utf-8
copied the list line by line to a temporary list ( not encodable lines were copied with a placholder 'None\n')将列表逐行复制到临时列表（不可编码的行是用占位符'None\n'复制的）
Then I went back to do the searching part on the copied list so I got my lines I wanted to replace然后我回去在复制的列表上做搜索部分，所以我得到了我想要替换的行
replaced the lines in the original binary list (same indices!)替换了原始二进制列表中的行（相同的索引！）
wrote it back to file写回文件
the resulting pdf was a bit corupted because of whitespace before the target ids of the bookmarks but by recompressing qpdf fixed it:)生成的 pdf 有点损坏，因为书签的目标 id 之前有空格，但通过重新压缩 qpdf 修复了它:)

The code is very messy at the moment and so I don't want to publish it right now.目前代码非常混乱，所以我现在不想发布它。 But I want to add it at github within the next few weeks.但我想在接下来的几周内将它添加到 github。 If anyone needs it: just comment and it will have more priority.如果有人需要它：只需发表评论，它将具有更高的优先级。

Thanks to anyone who wanted to help:) Lukas感谢任何想提供帮助的人:) Lukas

在 Python 中编辑几行未压缩的 PDF

问题描述

1 个解决方案

解决方案1
0 2021-04-18 11:37:48

在 Python 中编辑几行未压缩的 PDF

问题描述

1 个解决方案

解决方案1 0 2021-04-18 11:37:48

解决方案1
0 2021-04-18 11:37:48