How to extract attached files from PDF with itext7

Question

How does one extract attached files from a PDF with itext7?

The sample codes I found for itext5 all don't work any more.

A byte[] per file would be what I need, as in the itext5 example below:

    PdfReader reader = new PdfReader(SRC);
    Map<String, byte[]> files = new HashMap<String,byte[]>();
    PdfObject obj;

    for (int i = 1; i <= reader.getXrefSize(); i++) {
        obj = reader.getPdfObject(i);
        if (obj != null && obj.isStream()) {
            PRStream stream = (PRStream)obj;
            byte[] b;
            try {
                b = PdfReader.getStreamBytes(stream);
            }
            catch(UnsupportedPdfException e) {
                b = PdfReader.getStreamBytesRaw(stream);
            }
            files.put(Integer.toString(i), b);
        }
    }

Thx /markus

Answer 1

You are searching for attachments using brute force instead of by querying the catalog for embedded files and querying page dictionaries for attachment annotations.

Anyway, if I'd port your code to iText 7, it would look like this:

PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));
PdfObject obj;
for (int i = 1; i <= pdfDoc.getNumberOfPdfObjects(); i++) {
    obj = pdfDoc.getPdfObject(i);
    if (obj != null && obj.isStream()) {
        byte[] b;
        try {
            b = ((PdfStream) obj).getBytes();
        } catch (PdfException exc) {
            b = ((PdfStream) obj).getBytes(false);
        }
        FileOutputStream fos = new FileOutputStream(String.format(DEST, i));
        fos.write(b);
        fos.close();
    }
}
pdfDoc.close();

The only change I made, is that I write the stream to a file.

How to extract attached files from PDF with itext7

Question

1 answers

solution1
1 ACCPTED 2016-06-14 06:18:00

How to extract attached files from PDF with itext7

Question

1 answers

solution1 1 ACCPTED 2016-06-14 06:18:00

solution1
1 ACCPTED 2016-06-14 06:18:00