简体   繁体   中英

Convert wordml (xml) to XHTML/HTML

I'm currently working on a way to convert a wordml-xml (or rather the body-part) into a valid xhtml/html format. The reason for that is that there are a bunch of breaks, paragraphs and so on that I want to display properly in my WebForms-Application.

I've been searching for ways to do this for the past few hours and the only thing I found that somewhat resembles my issues is the following Blog ( https://msdn.microsoft.com/en-us/library/ff628051(v=office.14).aspx#XHtml_Using ). The problem is that the transformation is based on.docx and not on XML. I could try to convert the XML into a docx and work with that, but that wouldn't really be an effective way to deal with it. not to mention that I'd have to find a way to convert the XML into docx first.

I really hope that somebody out there can help me with this, because I'm somewhat out of ideas.

Thanks in advance, snap.

Example: The w:body-Element inside of the XML looks like this:

<w:body xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
  <wx:sect xmlns:wx="http://schemas.microsoft.com/office/word/2003/auxHint">
    <w:p wsp:rsidR="00FF5F75" wsp:rsidRDefault="00626E80" xmlns:wsp="http://schemas.microsoft.com/office/word/2003/wordml/sp2">
      <w:r wsp:rsidRPr="00EA67E2">
        <w:rPr>
          <w:rFonts w:fareast="Times New Roman" />
          <w:sz w:val="26" />
          <w:sz-cs w:val="26" />
          <w:lang w:fareast="JA" />
        </w:rPr>
        <w:t>Leider können wir die Kosten für die Impfung gegen %</w:t>
      </w:r>
      <w:r wsp:rsidRPr="00EA67E2">
        <w:rPr>
          <w:rFonts w:fareast="Times New Roman" />
          <w:sz w:val="26" />
          <w:sz-cs w:val="26" />
          <w:highlight w:val="yellow" />
          <w:lang w:fareast="JA" />
        </w:rPr>
        <w:t>XY</w:t>
      </w:r>
      <w:r wsp:rsidRPr="00EA67E2">
        <w:rPr>
          <w:rFonts w:fareast="Times New Roman" />
          <w:sz w:val="26" />
          <w:sz-cs w:val="26" />
          <w:lang w:fareast="JA" />
        </w:rPr>
        <w:t>% nicht übernehmen.</w:t>
      </w:r>
      <w:r wsp:rsidRPr="00EA67E2">
        <w:rPr>
          <w:rFonts w:fareast="Times New Roman" />
          <w:sz w:val="26" />
          <w:sz-cs w:val="26" />
          <w:lang w:fareast="JA" />
        </w:rPr>
        <w:br />
      </w:r>
      <w:r wsp:rsidRPr="00EA67E2">
        <w:rPr>
          <w:rFonts w:fareast="Times New Roman" />
          <w:sz w:val="26" />
          <w:sz-cs w:val="26" />
          <w:lang w:fareast="JA" />
        </w:rPr>
        <w:br />
        <w:t>Die DAK-Gesundheit zahlt Ihnen die Impfungen, die in den Schutzimpfungs-Richtlinien des Gemeinsamen Bundesausschusses genannt sind. Die Impfung gegen %</w:t>
      </w:r>
....

In a regular word-document, where this thing is part of an Add-in word Displays as a break etc. What I want is to convert these elements to proper HTML/XHTML.

Try

protected string ConvertXmlToHtmlTable(string xml)
{
  StringBuilder html = new StringBuilder("<table align='center' " + 
     "border='1' class='xmlTable'>\r\n");
  try
  {
      XDocument xDocument = XDocument.Parse(xml);
      XElement root = xDocument.Root;

      var xmlAttributeCollection = root.Elements().Attributes();


      foreach (var ele in root.Elements())
      {
          if (!ele.HasElements)
          {
              string elename = "";
              html.Append("<tr>");

              elename = ele.Name.ToString();

              if (ele.HasAttributes)
              {
                  IEnumerable<XAttribute> attribs = ele.Attributes();
                  foreach (XAttribute attrib in attribs)
                  elename += Environment.NewLine + attrib.Name.ToString() + 
                    "=" + attrib.Value.ToString();
              }

              html.Append("<td>" + elename + "</td>");
              html.Append("<td>" + ele.Value + "</td>");
              html.Append("</tr>");
          }
          else
          {
              string elename = "";
              html.Append("<tr>");

              elename = ele.Name.ToString();

              if (ele.HasAttributes)
              {
                  IEnumerable<XAttribute> attribs = ele.Attributes();
                  foreach (XAttribute attrib in attribs)
                  elename += Environment.NewLine + attrib.Name.ToString() + "=" + attrib.Value.ToString();
              }

              html.Append("<td>" + elename + "</td>");
              html.Append("<td>" + ConvertXmlToHtmlTable(ele.ToString()) + "</td>");
              html.Append("</tr>");
          }
      }

      html.Append("</table>");
  }
  catch (Exception e)
  {
      return xml;
      // Returning the original string incase of error.
  }
  return html.ToString();
}

There is a project on github and a nuget package called Open XML Power Tools . One of its features is also High-fidelity conversion of DOCX to HTML/CSS . Haven't tried it out, but just might soon.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM