简体   繁体   English

Java:使用iText将2000-5000个PDF合并为1,产生OutOfMemorryError

[英]Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

I have eyeballing this code for a long time, trying to reducing the amount of memory the code use and still it generated java.lang.OutOfMemoryError: Java heap space . 我长时间盯着这段代码,试图减少代码使用的内存量,并且仍然生成java.lang.OutOfMemoryError: Java heap space As my last resort, I want to ask the community on how can I improve this code to avoid OutOfMemoryError 作为我的最后选择,我想向社区询问如何改进此代码以避免OutOfMemoryError

I have a driver/manifest file (.txt file) that contain information about the PDFs. 我有一个驱动程序/清单文件(.txt文件),其中包含有关PDF的信息。 I have about 2000-5000 pdf inside a zip file that I need to combine together. 我需要合并在一起的zip文件中包含大约2000-5000 pdf。 Before the combining, for each pdf, I need to add 2-3 more pdf pages to it. 在合并之前,对于每个pdf,我需要再添加2-3个pdf页面。 Manifest object holds information about a pdf. Manifest对象包含有关pdf的信息。

try{
    blankPdf = new PdfReader(new FileInputStream(config.getBlankPdf()));
    mdxBacker = new PdfReader(new FileInputStream(config.getMdxBacker()));
    theaBacker = new PdfReader(new FileInputStream(config.getTheaBacker()));
    mdxAffidavit = new PdfReader(new FileInputStream(config.getMdxAffidavit()));
    theaAffidavit = new PdfReader(new FileInputStream(config.getTheaAffidavit()));

    ImmutableList<Manifest> manifestList = //Read manifest file and obtain List<Manifest>
    File zipFile = new File(config.getInputDir() + File.separator + zipName);
    //Extracting PDF into `process` folder
    ZipUtil.extractAll(config.getExtractPdfDir(), zipFile);
    outputPdfName = zipName.replace(".zip", ".pdf");
    outputZipStream = new FileOutputStream(config.getOutputDir() + 
                                                    File.separator + outputPdfName);
    document = new Document(PageSize.LETTER, 0, 0, 0, 0);
    writer = new PdfCopy(document , outputZipStream);
    document.open();    //Open the document
    //Start combining PDF files together    
    for(Manifest m : manifestList){
        //Obtain full path to the current pdf
        String pdfFilePath = config.getExtractPdfDir() + File.separator + m.getPdfName();
        //Before combining PDF, add backer and affidavit to individual PDF
        PdfReader pdfReader = PdfUtil.addBackerAndAffidavit(config, pdfType, m, 
                pdfFilePath, blankPdf, mdxBacker, theaBacker, mdxAffidavit, 
            theaAffidavit);
        for(int pageNumber=1; pageNumber<=pdfReader.getNumberOfPages(); pageNumber++){
            document.newPage();
            PdfImportedPage page = writer.getImportedPage(pdfReader, pageNumber);
            writer.addPage(page);
        }
    }
} catch (DocumentException e) {

} catch (IOException e) {

} finally{
    if(document != null) document.close();
    try{
        if(outputZipStream != null) outputZipStream.close();
        if(writer != null) writer.close();
    }catch(IOException e){

    }
}

Please, rest assure that I have look at this code for a long time, and try rewrite it many times to reduce the amount of memory it using. 请确保我已经看了很长时间此代码,并尝试多次重写它以减少使用的内存量。 After the OutOfMemoryError, there are still lots of pdf files that have not been added 2-3 extra pages, so I think it is inside addBackerAndAffidavit , however, I try to close every resources I opened, but it still exception out. 在OutOfMemoryError之后,仍然有很多未添加2-3个额外页面的pdf文件,因此我认为它位于addBackerAndAffidavit ,但是,我尝试关闭我打开的所有资源,但它仍然例外。 Please help. 请帮忙。

You need to invoke PdfWriter#freeReader() by end of every loop to free the involved PdfReader . 您需要在每个循环结束时调用PdfWriter#freeReader()以释放涉及的PdfReader The PdfCopy#freeReader() has this method inherited from PdfWriter and does the same. PdfCopy#freeReader()具有从PdfWriter继承的此方法,并且执行相同的操作。 See also the javadoc : 另请参阅javadoc

freeReader freeReader

 public void freeReader(PdfReader reader) throws IOException 

Description copied from class : PdfWriter 从类复制的描述PdfWriter
Use this method to writes the reader to the document and free the memory used by it. 使用此方法将阅读器写入文档并释放其使用的内存。 The main use is when concatenating multiple documents to keep the memory usage restricted to the current appending document. 主要用途是在串联多个文档时将内存使用量限制在当前附加文档中。

Overrides : 覆盖
freeReader in class PdfWriter freeReaderPdfWriter

Parameters : 参数
reader - the PdfReader to free reader -在PdfReader免费

Throws : 抛出
IOException - on error IOException错误

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM