Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

Question

I have eyeballing this code for a long time, trying to reducing the amount of memory the code use and still it generated java.lang.OutOfMemoryError: Java heap space . As my last resort, I want to ask the community on how can I improve this code to avoid OutOfMemoryError

I have a driver/manifest file (.txt file) that contain information about the PDFs. I have about 2000-5000 pdf inside a zip file that I need to combine together. Before the combining, for each pdf, I need to add 2-3 more pdf pages to it. Manifest object holds information about a pdf.

try{
    blankPdf = new PdfReader(new FileInputStream(config.getBlankPdf()));
    mdxBacker = new PdfReader(new FileInputStream(config.getMdxBacker()));
    theaBacker = new PdfReader(new FileInputStream(config.getTheaBacker()));
    mdxAffidavit = new PdfReader(new FileInputStream(config.getMdxAffidavit()));
    theaAffidavit = new PdfReader(new FileInputStream(config.getTheaAffidavit()));

    ImmutableList<Manifest> manifestList = //Read manifest file and obtain List<Manifest>
    File zipFile = new File(config.getInputDir() + File.separator + zipName);
    //Extracting PDF into `process` folder
    ZipUtil.extractAll(config.getExtractPdfDir(), zipFile);
    outputPdfName = zipName.replace(".zip", ".pdf");
    outputZipStream = new FileOutputStream(config.getOutputDir() + 
                                                    File.separator + outputPdfName);
    document = new Document(PageSize.LETTER, 0, 0, 0, 0);
    writer = new PdfCopy(document , outputZipStream);
    document.open();    //Open the document
    //Start combining PDF files together    
    for(Manifest m : manifestList){
        //Obtain full path to the current pdf
        String pdfFilePath = config.getExtractPdfDir() + File.separator + m.getPdfName();
        //Before combining PDF, add backer and affidavit to individual PDF
        PdfReader pdfReader = PdfUtil.addBackerAndAffidavit(config, pdfType, m, 
                pdfFilePath, blankPdf, mdxBacker, theaBacker, mdxAffidavit, 
            theaAffidavit);
        for(int pageNumber=1; pageNumber<=pdfReader.getNumberOfPages(); pageNumber++){
            document.newPage();
            PdfImportedPage page = writer.getImportedPage(pdfReader, pageNumber);
            writer.addPage(page);
        }
    }
} catch (DocumentException e) {

} catch (IOException e) {

} finally{
    if(document != null) document.close();
    try{
        if(outputZipStream != null) outputZipStream.close();
        if(writer != null) writer.close();
    }catch(IOException e){

    }
}

Please, rest assure that I have look at this code for a long time, and try rewrite it many times to reduce the amount of memory it using. After the OutOfMemoryError, there are still lots of pdf files that have not been added 2-3 extra pages, so I think it is inside addBackerAndAffidavit , however, I try to close every resources I opened, but it still exception out. Please help.

Answer 1

You need to invoke PdfWriter#freeReader() by end of every loop to free the involved PdfReader . The PdfCopy#freeReader() has this method inherited from PdfWriter and does the same. See also the javadoc :

freeReader
 public void freeReader(PdfReader reader) throws IOException 
Description copied from class : PdfWriter
Use this method to writes the reader to the document and free the memory used by it. The main use is when concatenating multiple documents to keep the memory usage restricted to the current appending document.

Overrides :
freeReader in class PdfWriter

Parameters :
reader - the PdfReader to free

Throws :
IOException - on error

Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

Question

1 answers

solution1
4 ACCPTED 2011-09-26 20:17:27

freeReader

Java: combine 2000-5000 PDFs into 1 using iText yield OutOfMemorryError

Question

1 answers

solution1 4 ACCPTED 2011-09-26 20:17:27

freeReader

solution1
4 ACCPTED 2011-09-26 20:17:27