简体   繁体   中英

How to add metadata to PDF document using PDFbox?

I have an input stream of a PDF document available to me. I would like to add subject metadata to the document and then save it. I'm not sure how to do this.

I came across a sample recipe here: https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html

However, it is still fuzzy. Below is what I'm trying and places where I have questions

PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket

The following code sets the title of a PDF document, but it should be adaptable to work with other properties as well:

public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
    try {
        PDDocument document = PDDocument.load(documentBytes);
        PDDocumentInformation info = document.getDocumentInformation();
        info.setTitle(title);
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        document.save(baos);
        return baos.toByteArray();
    } catch (IOException e) {
        e.printStackTrace();
    }

    return null;
}

Apache PDFBox is needed, so import it to eg Maven with:

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.6</version>
</dependency>

Add a title with:

byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");

Display it in the browser with (JSF example):

<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>

Result (Chrome):

PDF文件标题更改结果

Another much easier way to do this would be to use the built-in Document Information object:

PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");

This also has the benefit of not requiring the xmpbox library.

This answer uses xmpbox and comes from the AddMetadataFromDocInfo example in the source code download:

XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM