使用pdfbox分割pdf，但丢失字体

Question

I wrote some code in Java using the pdfbox API that splits a pdf document into it's individual pages, looks through the pages for a specific string, and then makes a new pdf from the page with the string on it. 我使用pdfbox API用Java编写了一些代码，该API将pdf文档拆分为单独的页面，在页面中查找特定的字符串，然后从带有字符串的页面中创建新的pdf。 My problem is that when the new page is saved, I lose my font. 我的问题是，保存新页面时，我丢失了字体。 I just made a quick word document to test it and the default font was calibri, so when I run the program I get an error box that reads: "Cannot extract the embedded font..." So it replaces the font with some other default. 我只是做了一个快速的Word文档来测试它，默认字体是calibri，所以当我运行该程序时，我看到一个错误框，上面写着：“无法提取嵌入的字体...”，因此它将字体替换为其他默认字体。

I have seen a lot of example code that shows how to change the font when you are inputting text to be placed in the pdf, but nothing that sets the font for the pdf. 我看过很多示例代码，这些示例代码显示了在输入要放置在pdf中的文本时如何更改字体，但是没有什么可以设置pdf的字体。

If anyone is familiar with a way to do this, (or can find documentation/examples), I would greatly appreciate it! 如果有人熟悉这样做的方法（或可以找到文档/示例），我将不胜感激！

Edit: forgot to include some sample code 编辑：忘记包括一些示例代码

if (pageContent.indexOf(findThis) >= 0){
                PDPage pageToRip = pages.get(i);
                >>set the font of pageToRip here
                res.importPage(pageToRip); //res is the new document that will be saved
            }

I don't know if that helps any, but I figured I'd include it. 我不知道这是否有帮助，但我认为我会包括在内。

Also, this is what the change looks like if the pdf is written in calibri and split: 同样，如果pdf是用calibri编写并拆分的，则更改内容如下所示：

左：calibri，右：更改为

Note: This might be a nonissue, it depends on the font used in the files that will need to be processed. 注意：这可能不是问题，它取决于需要处理的文件中使用的字体。 I tried some things besides Calibri and it worked out fine. 我尝试了Calibri以外的其他方法，效果很好。

Answer 1

From How to extract fonts from a PDF : 从如何从PDF中提取字体：

You actually cannot extract a font from a PDF, not even if the font is fully embedded. 实际上，即使字体完全嵌入，也无法从PDF中提取字体。 There are two reasons why this is not feasible: 为什么这样做不可行有两个原因：

•Most fonts are copyrighted, making it illegal to use an extractor. •大多数字体受版权保护，因此使用提取程序是非法的。

•When a font is embedded in a PDF, not all of the font data are included. •将字体嵌入PDF时，并非所有字体数据都包括在内。 Obviously the font outline data are included as well as the font width tables. 显然，包括了字体轮廓数据以及字体宽度表。 Other information, such as data about ligatures, are irrelevant within the PDF so those data do not get enclosed in a PDF. 其他信息（例如有关连字的数据）在PDF中无关紧要，因此这些数据不会包含在PDF中。 I am not aware of any font extraction tools but if you come across one, the above reasons should make it clear that these utilities are to be avoided. 我不知道任何字体提取工具，但是如果您遇到一种字体提取工具，上述原因应该清楚表明应避免使用这些实用程序。

使用pdfbox分割pdf，但丢失字体

问题描述

1 个解决方案

解决方案1
0 已采纳 2011-10-03 18:38:03

使用pdfbox分割pdf，但丢失字体

问题描述

1 个解决方案

解决方案1 0 已采纳 2011-10-03 18:38:03

解决方案1
0 已采纳 2011-10-03 18:38:03