简体   繁体   English

使用Java和Itext编辑PDF文本

[英]Editing PDF text using Java and Itext

Is there a way I can edit a PDF document text? 有什么办法可以编辑PDF文档文本? like find and replace specific text ? 喜欢查找和替换特定文本?

I have a PDF document which contains placeholders for text that I need to identify and be replaced or just delete that text. 我有一个PDF文档,其中包含需要识别和替换或仅删除该文本的文本的占位符。

I am able to edit the pdf with a specific coordinates (x, y) but unable to identify and replace. 我能够使用特定的坐标(x,y)编辑pdf,但无法识别和替换。 All the libraries that I saw created PDF from scratch and small editing functionality. 我看到的所有库都是从头开始创建PDF并具有小的编辑功能。 Is there anyway I can edit above explained using itext? 无论如何,我可以使用itext进行上述编辑吗? please advise...thank you! 请指教...谢谢!

**Example : A pdf document contains following paragaph. **示例:pdf文档包含以下文字。 In this paragraph, I need to identify DATE: and FROM: as a text and replace it with something else. 在这一段中,我需要将DATE:和FROM:标识为文本并将其替换为其他内容。

The oldest classical Greek and Latin writing had little or no spaces between words or other ones, and could be written in boustrophedon (alternating directions). 最古老的古典希腊文和拉丁文文字或其他文字之间几乎没有或没有空格,可以用牛头文字(交替方向)书写。 Over time, text direction (left to right) became standardized, and word dividers and terminal punctuation became common. 随着时间的流逝,文本方向(从左到右)变得标准化,并且单词分隔符和终端标点符号也变得很普遍。 **DATE: FROM: The first way to divide sentences into groups was the original paragraphos, similar to an underscore at the beginning of the new group -----------------------------------------------------------** **日期: 发件人: 将句子分为组的第一种方法是原始的paraos,类似于新组开始时的下划线 -------------------- --------------------------------------- **

Allow me to copy the intro of chapter 6 of my book : 请允许我复制本书 第6章的简介:

When I wrote the first book about iText, the publisher didn't like the subtitle “Creating and Manipulating PDF.” He didn't like the word manipulating because of some of its pejorative meanings. 当我写关于iText的第一本书时,出版商不喜欢副标题“创建和操纵 PDF”。他不喜欢操纵这个词,因为它具有贬义的含义。 If you consult the dictionary on Yahoo! 如果您查阅Yahoo!上的词典 education , you'll find the following definitions: 教育 ,您将找到以下定义:

  • To influence or manage shrewdly or deviously 巧妙地或巧妙地影响或管理
  • To tamper with or falsify for personal gain 篡改或伪造个人利益

Obviously, that's not what the book is about. 显然,这不是本书的目的。 The publisher suggested “Creating and Editing PDF” as a better subtitle. 出版商建议将“创建和编辑 PDF”作为更好的字幕。 I explained that PDF isn't a document format well suited for editing. 我解释说PDF不是很适合编辑的文档格式。 PDF is an end product. PDF是最终产品。 It's a display format. 这是一种显示格式。 It's not a word processing format. 不是 文字处理格式。

In a word processing format, the content is distributed over different pages when you open the document in an application, not earlier. 在文字处理格式中,当您在应用程序中打开文档时,内容会分布在不同的页面上,而不是更早。 This has some disadvantages: if you open the same document in different applications, you can end up with a different page count. 这有一些缺点:如果在不同的应用程序中打开同一文档,则最终页数将不同。 The same text snippet can be on page X when looked at in Microsoft Word, and on page Y when viewed in Open Office. 在Microsoft Word中查看时,相同的文本片段可以在X页上,而在Open Office中查看时,可以在Y页上。 That's exactly the kind of problem you want to avoid by choosing PDF. 您正是要选择PDF来避免这种问题。

In a PDF document, every character or glyph on a PDF page has its fixed position, regardless of the application that's used to view the document. 在PDF文档中,PDF页面上的每个字符或字形都有其固定位置,而与用于查看文档的应用程序无关。 This is an advantage, but it also comes with a disadvantage. 这是一个优点,但也有一个缺点。 Suppose you want to replace the word “edit” with the word “manipulate” in a sentence, you'd have to reflow the text. 假设您想将句子中的“编辑”一词替换为“操纵”一词,则必须对文本进行重排。 You'd have to reposition all the characters that follow that word. 您必须重新定位该单词后面的所有字符。 Maybe you'd even have to move a portion of the text to the next page. 也许您甚至不得不将部分文本移到下一页。 That's not trivial, if not impossible. 即使不是不可能的,这也不是小事。

If you want to “edit” a PDF, it's advised that you change the original source of the document and remake the PDF. 如果要“编辑” PDF,建议您更改文档的原始来源并重新制作PDF。 If the original document was written using Microsoft Word, change the Word document, and make the PDF from the new version of the Word document. 如果原始文档是使用Microsoft Word编写的,请更改Word文档,然后从Word文档的新版本制作PDF。 Don't expect any tool to be able to edit a PDF file the same way you'd edit a Word document. 不要指望任何工具能够像编辑Word文档一样编辑PDF文件。

This being said, the verb “to manipulate” also means 话虽这么说,动词“操纵”也意味着

  • To move, arrange, operate, or control by the hands or by mechanical means, especially in a skillful manner 用手,机械地用手或机械手段移动,布置,操作或控制

That's exactly what you're going to do in this chapter. 这正是本章中要执行的操作。 Using iText, you're going to manipulate the pages of a PDF file in a skillful manner. 使用iText,您将可以熟练地操作PDF文件的页面。 You're going to treat a PDF document as if it were made of digital paper. 您将把PDF文档当作由数码纸制成。

In your question, you say: "All the libraries that I saw created PDF from scratch and small editing functionality." 在您的问题中,您说: “我看到的所有库都是从头开始的,并且具有小的编辑功能。”

Well, that's only normal. 好吧,那只是正常现象。 It's inherent to the document format you've chosen. 它是您选择的文档格式所固有的。 Your design that involves "placeholders for text that you need to identify and replace or just delete" is seriously flawed. 您的设计涉及“您需要标识和替换或删除的文本的占位符”,这是一个严重的缺陷。 It suffers from a wrong choice of document format. 错误地选择了文档格式。 You should have chosen a format that is suited for editing. 您应该选择适合编辑的格式。 PDF isn't such a format. PDF不是这种格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM