简体   繁体   中英

How to add content to an existing PDF document smoothly?

The requirement is : I have an existing PDF document. And I want to insert a paragraph which is a summary of something into the PDF document at the very beginning (the first page).

I'm using itext2.1.5 library to import the existing PDF document and do the insert stuff. I already have a solution but not very satisfying. My current way is shrinking the existing first page thus it looks smaller and occupies less space, and then put the new paragraph above it. But customer is not satisfied with this solution, they think the font size is inconsistent through out the whole new PDF document(First page's font size looks smaller than the other pages' because of the shrink).

So I wonder if there's a better way to accomplish this goal that is inserting some content into an existing PDF smoothly just like doing so in a Word document?

Thanks!

EDIT: Why did I get a down vote?

There isn't really any practical way to do this. As with any type of document, it is theoretically possible to make any change to a PDF, but doing so is rather like trying to debug a program without the source code; even a minor change in the object code would force you to move everything around, and you'd have to edit all kinds of things that aren't designed to be human-editable, so as a practical matter, the only solution is to make the change in the source code and then recompile it.

PDF is a page description language ; its purpose is to specify exactly what the page will look like, and it has to do so in such excruciating detail that every PDF reader on every platform will produce exactly the same product. This includes not only the page content (text, images, etc.) and formatting (which text is bolded, which is centered, etc.), but also the fonts themselves, the exact XY coordinates of every object, and all kinds of other details which are so arcane that I can only guess at what they might be, and which no human should ever have to deal with unless they are authoring a PDF reader.

To add a paragraph of text to an existing PDF, you'd have to know every single detail of this, and you'd have to recalculate much of it to accommodate the additional paragraph. Which, in addition to being mind-numbing, would involve reinventing a significant amount of nontrivial logic to figure out where exactly everything goes on the page.

It's not worth it.

If all the documents you'll be dealing with have exactly the same layout, and you have a template or otherwise have the ability to create one like them, then you could programmatically extract the text content from the PDF, use it plus your new paragraph to fill in the template, and then render that as a PDF. For the first step (extracting the text), Apache PDFBox , an open source Java library for dealing with PDF documents, is a popular choice.

If the documents are at all heterogeneous, then you'll have to insist that your customer provide you with the documents in a transparent format; that is, one which describes the document's content and formatting, rather than the details of how exactly to render it. Anything that you can edit in a fully-featured word processor (plain text, Rich Text Format, OpenDocument, Office Open XML) qualifies. Java libraries exist for all of those formats (though I have no idea how good they are), and they are supported by both Microsoft Word and LibreOffice, so your customers probably created the documents in one of those formats in the first place.

If you must shrink the existing PDF contents to fit the new content AND the customer doesn't like the font shrinking then you can't solve the problem this way. It would be quite a feat to deliver conflicting requirements.

If the source PDFs are static (or change rarely), then you probably should mimic them and simply generate the complete PDFs on demand, allowing for the additions you need to make (insert a paragraph). You could extend your use of iText if you can code-up the necessary layouts or Docmosis or JODReports .

If your source PDFs vary or are dynamic then as Taymon indicated you don't have much chance. If you search the net using a search string like "purpose of PDF" you'll find some good reference information about why it's not easy to edit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM