简体   繁体   中英

Convert Doc or Docx into HTML in Java

How to convert doc or docx into HTML in Java. Using Apache POI, I was able to convert doc to html but unable to convert docx into html? Please show me sample code? This code work with doc but not docx.

        HWPFDocumentCore wordDocument = WordToHtmlUtils.loadDoc(stream);

        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(
                DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument());
        wordToHtmlConverter.processDocument(wordDocument);
        Document htmlDocument = wordToHtmlConverter.getDocument();
        ByteArrayOutputStream out = new ByteArrayOutputStream();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(out);

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        out.close();

        String result = new String(out.toByteArray());

There is no reason why this shouldn't / can't work.

Please review the following:

In short, make sure you're using an up-to-date version of POI, and have all of the required libraries.

(If you need additional assistance, please explain what isn't working. Are you getting compile-time errors? Run-time errors? Unexpected output?)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM