简体   繁体   English

使用 Java 将 DOC 文件转换为 DOCX

[英]Convert DOC file to DOCX with Java

I need to use DOCX files (actually the XML contained in them) in a Java software I'm currently developing, but some people in my company still use the DOC format.我需要在我目前正在开发的 Java 软件中使用 DOCX 文件(实际上是其中包含的 XML),但我公司中的一些人仍然使用 DOC 格式。

Do you know if there is a way to convert a DOC file to the DOCX format using Java?你知道是否有一种方法可以使用 Java 将 DOC 文件转换为 DOCX 格式? I know it's possible using C#, but that's not an option我知道可以使用 C#,但这不是一个选项

I googled it, but nothing came up...我用谷歌搜索了它,但没有任何结果......

Thanks谢谢

You may try Aspose.Words for Java .你可以试试Aspose.Words 的 Java It allows you to load a DOC file and save it as DOCX format .它允许您加载 DOC 文件并将其保存为 DOCX 格式 The code is very simple as shown below:代码非常简单,如下所示:

// Open a document.  
Document doc = new Document("input.doc"); 
// Save document. 
doc.save("output.docx");

Please see if this helps in your scenario.请查看这是否对您的方案有帮助。

Disclosure: I work as developer evangelist at Aspose.披露:我在 Aspose 担任开发人员布道师。

Check out JODConverter to see if it fits the bill.查看JODConverter以查看它是否符合要求。 I haven't personally used it.我个人没有用过。

JODConvertor calls OpenOffice/LibreOffice via a network protocol. JODConvertor 通过网络协议调用 OpenOffice/LibreOffice。 It can therefore 'do anything you can do in OpenOffice'.因此,它可以“做任何您在 OpenOffice 中可以做的事情”。 This includes converting formats.这包括转换格式。 But it only does as good a job as whatever version of OpenOffice you are running.但它只与您运行的任何版本的 OpenOffice 一样好。 I have some art in one of my docs, and it doesn't convert them as I hoped.我的一个文档中有一些艺术作品,但它并没有像我希望的那样转换它们。

JODConvertor is no longer supported, according to the google code web site for v3. JODConvertor 不再受支持,根据谷歌代码 web site for v3.

To get JOD to do the job you need to do something like要让 JOD 完成这项工作,您需要执行以下操作

private static void transformBinaryWordDocToDocX(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);
    DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
    docx.setStoreProperties(DocumentFamily.TEXT,
    Collections.singletonMap("FilterName", "MS Word 2007 XML"));

    converter.convert(in, out, docx);
}


private static void transformBinaryWordDocToW2003Xml(File in, File out)
{
    OfficeDocumentConverter converter = new OfficeDocumentConverter(officeManager);;
    DocumentFormat w2003xml = new DocumentFormat("Microsoft Word 2003 XML", "xml", "text/xml");
    w2003xml.setInputFamily(DocumentFamily.TEXT);
    w2003xml.setStoreProperties(DocumentFamily.TEXT, Collections.singletonMap("FilterName", "MS Word 2003 XML"));
    converter.convert(in, out, w2003xml);
}



private static OfficeManager officeManager;

@BeforeClass
public static void setupStatic() throws IOException {

          /*officeManager = new DefaultOfficeManagerConfiguration()
      .setOfficeHome("C:/Program Files/LibreOffice 3.6")
      .buildOfficeManager();
      */

    officeManager = new ExternalOfficeManagerConfiguration().setConnectOnStart(true).setPortNumber(8100).buildOfficeManager();


    officeManager.start();
}

@AfterClass
public static void shutdownStatic() throws IOException {

    officeManager.stop();
}

For this to work you need to be running LibreOffice as a networked server ( I could not get the 'run on demand' part of JODConvertor to work under windows with LO 3.6 very well )为此,您需要将 LibreOffice 作为联网服务器运行(我无法让 JODConvertor 的“按需运行”部分在具有 LO 3.6 的 windows 下工作得很好)

I needed the same conversion,after researching a lot found Jodconvertor can be useful in it, you can download the jar from https://code.google.com/p/jodconverter/downloads/list我需要相同的转换,经过大量研究发现 Jodconvertor 可以在其中有用,您可以从https://code.google.com/p/jodconverter/downloads/list下载 jar

Add jodconverter-core-3.0-beta-4-sources.jar file to your project lib将 jodconverter-core-3.0-beta-4-sources.jar 文件添加到您的项目库中

  //1) Create OfficeManger Object     
OfficeManager officeManager = new DefaultOfficeManagerConfiguration()
                .setOfficeHome(new File("/opt/libreoffice4.4"))
                .buildOfficeManager();
        officeManager.start();
    // 2) Create JODConverter converter   
        OfficeDocumentConverter converter = new OfficeDocumentConverter(
                officeManager);
// 3)Create DocumentFormat for docx
DocumentFormat docx = converter.getFormatRegistry().getFormatByExtension("docx");
        docx.setStoreProperties(DocumentFamily.TEXT,
                Collections.singletonMap("FilterName", "MS Word 2007 XML"));
//4)Call convert funtion in converter object
converter.convert(new File("doc/AdvancedTable.doc"), new File(
                "docx/AdvancedTable.docx"), docx);

Use newer versions of jars jodconverter-core-4.2.2.jar and jodconverter-local-4.2.2.jar使用较新版本的 jars jodconverter-core-4.2.2.jarjodconverter-local-4.2.2.jar

String inputFile = "*.doc";
String outputFile = "*.docx";

LocalOfficeManager localOfficeManager = LocalOfficeManager.builder()
            .install()
            .officeHome(getDefaultOfficeHome()) //your path to openoffice
            .build();

  try {
      localOfficeManager.start();
      final DocumentFormat format
              = DocumentFormat.builder()
                      .from(DefaultDocumentFormatRegistry.DOCX)
                      .build();

      LocalConverter
              .make()
              .convert(new FileInputStream(new File(inputFile)))
              .as(DefaultDocumentFormatRegistry.getFormatByMediaType("application/msword"))
              .to(new File(outputFile))
              .as(format)
              .execute();

  } catch (OfficeException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } catch (FileNotFoundException ex) {
      Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
  } finally {
      OfficeUtils.stopQuietly(localOfficeManager);
  }

To convert DOC file to HTML look at this ( Convert Word doc to HTML programmatically in Java )要将 DOC 文件转换为 HTML,请查看此( 在 Java 中以编程方式将 Word doc 转换为 HTML

Use this: http://poi.apache.org/使用这个: http://poi.apache.org/

Or use this:或者使用这个:

XWPFDocument docx = new XWPFDocument(OPCPackage.openOrCreate(new File("hello.docx")));  
XWPFWordExtractor wx = new XWPFWordExtractor(docx);  
String text = wx.getText();  
System.out.println("text = "+text); 
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.OutputStream;

import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;


import org.apache.poi.hwpf.HWPFDocument;
import org.apache.poi.hwpf.extractor.WordExtractor;

import org.apache.poi.hwpf.usermodel.Range;
import org.apache.poi.poifs.filesystem.POIFSFileSystem;


public class TestCon {

    /**
     * @param args
     */
    public static void main(String[] args) {
        // TODO Auto-generated method stub

        POIFSFileSystem fs = null;  
        Document document = new Document();

        try {  
            System.out.println("Starting the test");  
            fs = new POIFSFileSystem(new FileInputStream("C:/Users/312845/Desktop/a.doc"));  

            HWPFDocument doc = new HWPFDocument(fs);  
            WordExtractor we = new WordExtractor(doc);  

            OutputStream file = new FileOutputStream(new File("C:/Users/312845/Desktop/test.docx")); 

            System.out.println("Document testing completed");  
        } catch (Exception e) {  
            System.out.println("Exception during test");  
            e.printStackTrace();  
        } finally {  
            // close the document  
            document.close();  
        }  
    }  
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM