简体   繁体   English

使用java将pdf转换为word文档

[英]Converting a pdf to word document using java

I've successfully converted JPEG to Pdf using Java, but don't know how to convert Pdf to Word using Java, the code for converting JPEG to Pdf is given below.我已经成功地使用 Java 将 JPEG 转换为 Pdf,但不知道如何使用 Java 将 Pdf 转换为 Word,下面给出了将 JPEG 转换为 Pdf 的代码。

Can anyone tell me how to convert Pdf to Word (.doc/ .docx) using Java?谁能告诉我如何使用 Java 将 Pdf 转换为 Word(.doc/.docx)?

import java.io.FileOutputStream;
import com.itextpdf.text.Image;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Document;

public class JpegToPDF {
    public static void main(String[] args) {
        try {
            Document convertJpgToPdf = new Document();
            PdfWriter.getInstance(convertJpgToPdf, new FileOutputStream(
                    "c:\\java\\ConvertImagetoPDF.pdf"));
            convertJpgToPdf.open();
            Image convertJpg = Image.getInstance("c:\\java\\test.jpg");
            convertJpgToPdf.add(convertJpg);
            convertJpgToPdf.close();
            System.out.println("Successfully Converted JPG to PDF in iText");
        } catch (Exception i1) {
            i1.printStackTrace();
        }
    }
}

In fact, you need two libraries.实际上,您需要两个库。 Both libraries are open source.这两个库都是开源的。 The first one is iText , it is used to extract the text from a PDF file.第一个是iText ,它用于从 PDF 文件中提取文本。 The second one is POI , it is ued to create the word document.第二个是POI ,用于创建word文档。

The code is quite simple:代码非常简单:

//Create the word document
XWPFDocument doc = new XWPFDocument();

// Open the pdf file
String pdf = "myfile.pdf";
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);

// Read the PDF page by page
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
    TextExtractionStrategy strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
    // Extract the text
    String text=strategy.getResultantText();
    // Create a new paragraph in the word document, adding the extracted text
    XWPFParagraph p = doc.createParagraph();
    XWPFRun run = p.createRun();
    run.setText(text);
    // Adding a page break
    run.addBreak(BreakType.PAGE);
}
// Write the word document
FileOutputStream out = new FileOutputStream("myfile.docx");
doc.write(out);
// Close all open files
out.close();
reader.close();

Beware: With the used extraction strategy, you will lose all formatting.当心:使用所使用的提取策略,您将丢失所有格式。 But you can fix this, by inserting your own, more complex extraction strategy.但是您可以通过插入您自己的更复杂的提取策略来解决此问题。

You can use 7-pdf library您可以使用 7-pdf 库

have a look at this it may help :看看这个它可能有帮助:

http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html http://www.7-pdf.de/sites/default/files/guide/manuals/library/index.html

PS: itext has some issues when given file is non RGB image, try this out!! PS:当给定的文件是非 RGB 图像时,itext 有一些问题,试试这个!!

Although it's far from being a pure Java solution OpenOffice/LibreOfffice allows one to connect to it through a TCP port;尽管它远非纯粹的 Java 解决方案,但 OpenOffice/LibreOffice 允许人们通过 TCP 端口连接到它; it's possible to use that to convert documents.可以使用它来转换文档。 If this looks like an acceptable solution, JODConverter can help you.如果这看起来是可接受的解决方案, JODConverter可以为您提供帮助。

You can also try another library called Free Spire.PDF for Java to convert PDF to Word.您还可以尝试另一个名为Free Spire.PDF for Java 的库将 PDF 转换为 Word。 The following are some code snippets for your reference.以下是一些代码片段供您参考。

import com.spire.pdf.*;

public class PDFToWord {
    public static void main(String[] args) {
        //create a PdfDocument object
        PdfDocument doc = new PdfDocument();

        //load a sample PDF file
        doc.loadFromFile("C:\\Users\\Test1\\Desktop\\Sample.pdf");

        //save as .doc file
        doc.saveToFile("output/ToDoc.doc",FileFormat.DOC);

        //save as. docx file
        doc.saveToFile("output/ToDocx.docx",FileFormat.DOCX);
        doc.close();
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM