简体   繁体   English

如何使用带格式的打开xml将docx转换为html文件

[英]How to convert docx to html file using open xml with formatting

I know there are lot of question having same title but I am currently having some issue for them I didn't get the correct way to go. 我知道有很多问题有相同的标题,但我目前有一些问题,他们我没有得到正确的方法去。

I am using Open xml sdk 2.5 along with Power tool to convert .docx file to .html file which uses HtmlConverter class for conversion. 我使用Open xml sdk 2.5和Power工具.docx文件转换为.html文件,该文件使用HtmlConverter类进行转换。

I am successfully able to convert the docx file into the Html file but the problem is, html file doesn't retain the original formatting of the document file. 我成功地将docx文件转换为Html文件,但问题是,html文件不保留文档文件的原始格式。 eg. 例如。 Font-size,color,underline,bold etc doesn't reflect into the html file. 字体大小,颜色,下划线,粗体等不会反映到html文件中。

Here is my existing code: 这是我现有的代码:

public void ConvertDocxToHtml(string fileName)
{
   byte[] byteArray = File.ReadAllBytes(fileName);
   using (MemoryStream memoryStream = new MemoryStream())
   {
      memoryStream.Write(byteArray, 0, byteArray.Length);
      using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStream, true))
      {
         HtmlConverterSettings settings = new HtmlConverterSettings()
         {
            PageTitle = "My Page Title"
         };
         XElement html = HtmlConverter.ConvertToHtml(doc, settings);
         File.WriteAllText(@"E:\Test.html", html.ToStringNewLineOnAttributes());
      }
    }
 }

So I just want to know if is there any way by which I can retain the formatting in converted HTML file. 所以我只想知道是否有任何方法可以保留转换后的HTML文件中的格式。

I know about some third party APIs which does the same thing. 我知道一些第三方API做同样的事情。 But I would prefer if there any way using open xml or any other open source to do this. 但我更喜欢使用open xml或任何其他开源来做这件事。

PowerTools for Open XML just released a new HtmlConverter module. PowerTools for Open XML刚刚发布了一个新的HtmlConverter模块。 It now contains an open source, free implementation of a conversion from DOCX to HTML formatted with CSS. 它现在包含一个开源的,免费实现从DOCX到HTML格式的转换。 The module HtmlConverter.cs supports all paragraph, character, and table styles, fonts and text formatting, numbered and bulleted lists, images, and more. 模块HtmlConverter.cs支持所有段落,字符和表格样式,字体和文本格式,编号和项目符号列表,图像等。 See http://bit.ly/1bclyg9 http://bit.ly/1bclyg9

您可能希望找到一个外部工具来帮助您完成此操作,例如Aspose Words

您的最终结果将与您的Word文档完全不同,但此链接可能会有所帮助。

You can use OpenXML Viewer extension for Firefox for Converting with formatting. 您可以使用OpenXML Viewer扩展程序进行Firefox格式转换。 http://openxmlviewer.codeplex.com This works for me. http://openxmlviewer.codeplex.com这适合我。 Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在c#中使用open xml sdk将doc转换为docx - How to convert doc to docx using open xml sdk in c# 使用OpenXml电动工具将Docx转换为html而不进行格式化 - Convert Docx to html using OpenXml power tools without formatting 如何下载使用打开的XML SDK创建的docx文件? - how to download docx file created with the open xml sdk? 如何使用开放XML将超链接添加到Word docx? - How to add hyperlinks into Word docx using open XML? C# 打开 XML HTML 到 DOCX 间距 - C# Open XML HTML to DOCX Spacing 如何将.docx转换为c#中的html文件并保存到各自的目录中? - how to convert .docx to html file in c# and save it respective directory? 使用 XML 文件中的数据生成 Word 文档 (docx)/基于模板将 XML 转换为 Word 文档 - Generate a Word document (docx) using data from an XML file / Convert XML to a Word document based on a template 将Word文件(doc / docx)转换为html文本 - Convert Word file (doc/docx) in to html text 打开 XML - 如何在 docx 文档中添加水印 - Open XML - How to add a watermark to a docx document 将docx,doc转换为使用open xml sdk 2.0来打开xml,在转换doc时出错,可以与docx正常工作,其错误是文件已损坏数据 - converting docx,doc to open xml using open xml sdk 2.0 its working fine with docx when converting doc its error that file has corrupted data
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM