简体   繁体   English

在PDFBox中,为什么保存后文件大小变得过大?

[英]In PDFBox, why does file size becomes extremely large after saving?

Question

I am using PDFBox 1.8.8 to manipulate existing PDF files. 我正在使用PDFBox 1.8.8处理现有的PDF文件。 After saving a document, the output file becomes several times larger than the original. 保存文档后,输出文件将比原始文件大几倍。 This is undesirable. 这是不希望的。

How can I reduce the file size of output files? 如何减小输出文件的文件大小?

How to replicate my situation 如何复制我的情况

In the following code, PDFBox simply loads an existing PDF and then save it. 在下面的代码中,PDFBox只是加载现有的PDF,然后将其保存。 Nothing else is done. 什么都没做。 Yet the file size still becomes several times larger. 但是,文件大小仍然会变大几倍。

Below are links to two sample input files. 以下是两个示例输入文件的链接。 For input1.pdf, file size increases from 6MB to 50MB. 对于input1.pdf,文件大小从6MB增加到50MB。 For input2.pdf, file size increases from 0.4MB to 1.3MB. 对于input2.pdf,文件大小从0.4MB增加到1.3MB。

https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf https://dl.dropboxusercontent.com/u/13566649/samplePDF/input1.pdf https://dl.dropboxusercontent.com/u/13566649/samplePDF/input2.pdf

import java.io.*;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.exceptions.*;


class Test {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");
        document.save("output.pdf");
        document.close();       
    }
}   

What I have tried 我尝试过的

I have tried using addCompression() method of PDStream class, as in the following code. 我已经尝试使用addCompression()的方法PDStream类,如在下面的代码。 It does not change anything. 它不会改变任何东西。 Output file size is still the same. 输出文件大小仍然相同。

class Test2 {

    public static void main(String[] args) throws IOException, COSVisitorException {

        PDDocument document = PDDocument.load("input1.pdf");

        for (int i = 0; i < document.getNumberOfPages(); i++) {
            PDPage page = (PDPage) document.getDocumentCatalog().getAllPages().get(i);
            page.getContents().addCompression();
        }

        document.save("output.pdf");
        document.close();    

    }

}   

I wrote this strange code and it works for me ( Apache PDFBox v.2.0.8 ): 我写了这个奇怪的代码,它对我有用Apache PDFBox v.2.0.8 ):

private void saveCompressedPDF(PDDocument srcDoc, OutputStream os) throws IOException {
    PDDocument outDoc = new PDDocument();
    outDoc.setDocumentInformation(srcDoc.getDocumentInformation());
    for (PDPage srcPage : srcDoc.getPages()) {
        new PDPageContentStream(outDoc, srcPage,
                PDPageContentStream.AppendMode.APPEND, true).close();
        outDoc.addPage(srcPage);
    }
    outDoc.save(os);
    outDoc.close();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM