简体   繁体   English

如何将交互式“目录”添加到扫描的pdf中?

[英]How can I add an interactive “table of contents” to a scanned pdf?

I'm trying to go from a paper document to a searchable pdf with a table of contents. 我正在尝试从纸质文档过渡到带有目录的可搜索pdf。

Sometimes you will download a pdf book or document, (like for example the Intel Manual which can be seen below) This document is searchable and it also has a table of contents. 有时您会下载pdf书籍或文档(例如,可以在下面看到的Intel手册)。该文档是可搜索的,并且还具有目录。 Now, when you put this same document on Google Drive and then open it up with PDF Expert on an ipad, it is still searchable with a table of contents. 现在,当您将同一文档放在Google云端硬盘上,然后在ipad上使用PDF Expert打开该文档时,仍然可以通过目录对其进行搜索。 This is what I'd like to do with all my scanned pdfs. 这就是我想对所有扫描的pdf进行的处理。 在此处输入图片说明在此处输入图片说明

Now a more concrete example. 现在是一个更具体的例子。 Shown below is a document that I've scanned with the Fujitsu ScanSnap. 下面显示的是我使用Fujitsu ScanSnap扫描过的文档。 It's also searchable thanks to some software that comes with the ScanSnap. 由于ScanSnap随附了某些软件,因此也可以搜索。 So now I have a searchable pdf that can be opened up locally or on my ipad, but it doesn't have a table of contents. 因此,现在我有了一个可搜索的pdf,可以在本地或在ipad上打开它,但是它没有目录。 So my main question is: How can I add a table of contents like the one in for the Intel Manual to a scanned pdf 所以我的主要问题是:如何将目录(如《英特尔手册》中的内容)添加到扫描的pdf中 在此处输入图片说明在此处输入图片说明

It seems like there's tons of people doing different things with "table of contents". 似乎有很多人使用“目录”来做不同的事情。 Like people who are designing documents use InDesign. 就像正在设计文档的人一样,使用InDesign。 I think that what I'm trying to do must be simpler than that. 我认为我要做的事情必须比这简单。 I'm thinking that there has to be an easy way to do this using say Adobe Acrobat Pro? 我认为必须使用Adobe Acrobat Pro这样简单的方法? Something about adding "bookmarks" or "links" or "tags" to the existing table of contents. 关于在现有目录中添加“书签”或“链接”或“标签”的内容。 Do you know of a clear and concise way to do this using acrobat or some other software? 您是否知道使用acrobat或某些其他软件的简洁明了的方法?

thanks for the help 谢谢您的帮助

I have done this before by combining multiple "booklets". 我之前通过组合多个“小册子”来做到这一点。 Each "Chapter" was a series of pages combined in Adobe Acrobat Pro. 每个“章节”都是Adobe Acrobat Pro中合并的一系列页面。 I would combine chapters into separate "booklets" and then name them a chapter name, and then combine all chapters into a new booklet. 我将章节合并到单独的“小册子”中,然后将它们命名为章节名称,然后将所有章节合并到新的小册子中。

Jpdfbookmark can work for scanned books Jpdfbookmark可以用于扫描的书

Watch tutorial video ≫ 观看教学影片≫

Step 1: Prepare the table of content 步骤1:准备目录

Save the TOC in a .txt file in this format: 使用以下格式将目录保存为.txt文件:

Chapter 1. The Beginning/23
    Para 1.1 Child of The Beginning/25,FitWidth,96
        Para 1.1.1 Child of Child of The Beginning/26,FitHeight,43
Chapter 2. The Continue/30,TopLeft,120,42
    Para 2.1 Child of The Beginning/32,FitPage

You can ORC the TOC and use regex to fix it. 您可以对TOC进行ORC,并使用正则表达式对其进行修复。

Step 2: Load that TOC 步骤2:加载该目录

Step 3: Prepare for step 4 步骤3:为步骤4做准备

This sounds dumb, but if you miss it you will be frustrated and have to do it again. 这听起来很愚蠢,但是如果您错过了它,您将感到沮丧,必须重新做一次。 Expand all bookmarks ( Ctrl + E ), select all of them, then go to Tools → Apply Page Offset 展开所有书签( Ctrl + E ),选择所有书签,然后转到“工具”→“应用页面偏移”

Step 4: Apply page offset 步骤4:套用页面偏移

This step should be self-explained. 这一步应该是不言自明的。 Don't forget to save. 不要忘记保存。


That's it. 而已。 You are done. 大功告成 For more information, you can read its its manual . 有关更多信息,您可以阅读其手册 The program has command line mode and can work on Linux, Mac. 该程序具有命令行模式,可以在Linux,Mac上运行。

If there are non-Roman characters, be sure to use the same encoding when dumping and applying bookmarks. 如果有非罗马字符,在转储和应用书签时请确保使用相同的编码。

I also have a complete guide to process scanned books, you may want to check it out: The ultimate guide to process scanned books . 我也有处理扫描书籍的完整指南,您可能想看看: 处理扫描书籍的终极指南


FYI: 供参考:
How to OCR tables of contents to proper outputs? 如何将目录的OCR转换为适当的输出?
How can I split in half a double-page scanned PDF in a single pass? 如何在一次通过中将双页扫描的PDF分成两半?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM