简体   繁体   中英

How to extract attached files from PDF with itext7

How does one extract attached files from a PDF with itext7?

The sample codes I found for itext5 all don't work any more.

A byte[] per file would be what I need, as in the itext5 example below:

    PdfReader reader = new PdfReader(SRC);
    Map<String, byte[]> files = new HashMap<String,byte[]>();
    PdfObject obj;

    for (int i = 1; i <= reader.getXrefSize(); i++) {
        obj = reader.getPdfObject(i);
        if (obj != null && obj.isStream()) {
            PRStream stream = (PRStream)obj;
            byte[] b;
            try {
                b = PdfReader.getStreamBytes(stream);
            }
            catch(UnsupportedPdfException e) {
                b = PdfReader.getStreamBytesRaw(stream);
            }
            files.put(Integer.toString(i), b);
        }
    }

Thx /markus

You are searching for attachments using brute force instead of by querying the catalog for embedded files and querying page dictionaries for attachment annotations.

Anyway, if I'd port your code to iText 7, it would look like this:

PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC));
PdfObject obj;
for (int i = 1; i <= pdfDoc.getNumberOfPdfObjects(); i++) {
    obj = pdfDoc.getPdfObject(i);
    if (obj != null && obj.isStream()) {
        byte[] b;
        try {
            b = ((PdfStream) obj).getBytes();
        } catch (PdfException exc) {
            b = ((PdfStream) obj).getBytes(false);
        }
        FileOutputStream fos = new FileOutputStream(String.format(DEST, i));
        fos.write(b);
        fos.close();
    }
}
pdfDoc.close();

The only change I made, is that I write the stream to a file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM