简体   繁体   English

C#:docx 通过 open xml power tools 从 html 生成 throw pre release 2007

[英]C#:docx generated from html by open xml power tools throw pre release 2007

I'm writing code that get the content of Docx file as HTML by using open XML power tools and now I want to convert it back to another docx file.我正在编写代码,通过使用开放式 XML 强大工具将 Docx 文件的内容作为 HTML 获取,现在我想将其转换回另一个 docx 文件。 the step that gets contents as HTML works fine but when I generate the docx file from that HTML the file cannot be opened and throws this error将内容作为 HTML 获取的步骤工作正常,但是当我从该 HTML 生成 docx 文件时,该文件无法打开并引发此错误

this file was created in a pre-release version of word 2007 and cannot be opened in this version此文件是在 word 2007 的预发布版本中创建的,无法在此版本中打开

the HTML generated from test docx is从测试 docx 生成的 HTML 是

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta
      charset="UTF-8" />
    <title>My Page Title</title>
    <meta
      name="Generator"
      content="PowerTools for Open XML" />
    <style>span { white-space: pre-wrap; }
p.pt-Normal {
    line-height: 107.9%;
    margin-bottom: 8pt;
    text-align: justify;
    font-family: ;
    font-size: 11pt;
    margin-top: 0;
    margin-left: 0;
    margin-right: 0;
}
span.pt-DefaultParagraphFont {
    font-family: ;
    font-size: 11pt;
    font-style: normal;
    font-weight: normal;
    margin: 0;
    padding: 0;
}
span.pt-DefaultParagraphFont-000000 {
    font-family: Calibri;
    font-size: 11pt;
    font-style: normal;
    font-weight: normal;
    margin: 0;
    padding: 0;
}
</style>
  </head>
  <body>
    <div>
      <p
        dir="rtl"
        class="pt-Normal">&#x200f;<span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;با سلام خدمت &#x200f;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&lt;&lt;&#x200f;</span><span
          class="pt-DefaultParagraphFont-000000">&#x200e;PERSONS.lname&#x200e;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&gt;&gt;&#x200f;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f; &#x200f;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&lt;&lt;&#x200f;</span><span
          class="pt-DefaultParagraphFont-000000">&#x200e;PERSONS.fname&#x200e;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&gt;&gt;&#x200f;</span></p>
      <p
        dir="rtl"
        class="pt-Normal">&#x200f;<span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;مدیر محترم &#x200f;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&lt;&lt;&#x200f;</span><span
          class="pt-DefaultParagraphFont-000000">&#x200e;OFFICE.name&#x200e;</span><span
          lang="fa-IR"
          class="pt-DefaultParagraphFont">&#x200f;&gt;&gt;&#x200f;</span></p>
    </div>
  </body>
</html>

and my code to save the above html as docx is和我将上述html保存为docx的代码是

using (WordprocessingDocument wordDoc =
        WordprocessingDocument.Create(dest_doc_path, WordprocessingDocumentType.Document))
            {


                MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();

                string htmlcontent = htmlTXT.Text;

                using (Stream stream = mainPart.GetStream())
                {
                    byte[] buf = (new UTF8Encoding()).GetBytes(htmlcontent);
                    stream.Write(buf, 0, buf.Length);
                }


                MessageBox.Show("DONE", "done", MessageBoxButton.OK);


            }

The answer is simple.答案很简单。 You must not insert HTML content into the MainDocumentPart because it is expected to contain a valid Open XML w:document element, eg, as the following simplified one:不得将 HTML 内容插入MainDocumentPart因为它应该包含有效的 Open XML w:document元素,例如,如下简化:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    <w:p>
      <w:r>
        <w:t>Hello, world!</w:t>
      </w:r>
    </w:p>
  </w:body>
</w:document>

The error message probably is a little misleading.错误消息可能有点误导。 HTML is simply invalid in this case.在这种情况下,HTML 就是无效的。

Depending on whether or not you changed the HTML after creating it (with the Open XML PowerTools) from the original Word document, you will have to either transform it back into valid Open XML markup (if you changed it) or simply use the Open XML markup from the original Word document.根据您在从原始 Word 文档(使用 Open XML PowerTools)创建 HTML 后是否更改了 HTML,您必须将其转换回有效的 Open XML 标记(如果您更改了它)或仅使用 Open XML原始 Word 文档中的标记。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM