I'm using this method for generating docx
file:
public static void CreateDocument(string documentFileName, string text)
{
using (WordprocessingDocument wordDoc =
WordprocessingDocument.Create(documentFileName, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = wordDoc.AddMainDocumentPart();
string docXml =
@"<?xml version=""1.0"" encoding=""UTF-8"" standalone=""yes""?>
<w:document xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">
<w:body><w:p><w:r><w:t>#REPLACE#</w:t></w:r></w:p></w:body>
</w:document>";
docXml = docXml.Replace("#REPLACE#", text);
using (Stream stream = mainPart.GetStream())
{
byte[] buf = (new UTF8Encoding()).GetBytes(docXml);
stream.Write(buf, 0, buf.Length);
}
}
}
It works like a charm:
CreateDocument("test.docx", "Hello");
But what if I want to put HTML content instead of Hello
? for example:
CreateDocument("test.docx", @"<html><head></head>
<body>
<h1>Hello</h1>
</body>
</html>");
Or something like this:
CreateDocument("test.docx", @"Hello<BR>
This is a simple text<BR>
Third paragraph<BR>
Sign
");
both cases creates an invalid structure for document.xml
. Any idea? How can I generate a docx file from a HTML content?
I realize I'm 7 years late to the game here. Still, for future people searching on how to convert from HTML to Word Doc, this blog posting on a Microsoft MSDN site gives most of the ingredients necessary to do this using OpenXML. I found the post itself to be confusing, but the source code that he included clarified it all for me.
The only piece that was missing was how to build a Docx file from scratch, instead of how to merge into an existing one as his example shows. I found that tidbit from here .
Unfortunately the project I used this in is written in vb.net. So I'm going to share the vb.net code first, then an automated c# conversion of it, that may or may not be accurate.
vb.net code:
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Wordprocessing
Imports System.IO
Dim ms As IO.MemoryStream
Dim mainPart As MainDocumentPart
Dim b As Body
Dim d As Document
Dim chunk As AlternativeFormatImportPart
Dim altChunk As AltChunk
Const altChunkID As String = "AltChunkId1"
ms = New MemoryStream()
Using myDoc = WordprocessingDocument.Create(ms,WordprocessingDocumentType.Document)
mainPart = myDoc.MainDocumentPart
If mainPart Is Nothing Then
mainPart = myDoc.AddMainDocumentPart()
b = New Body()
d = New Document(b)
d.Save(mainPart)
End If
chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID)
Using chunkStream As Stream = chunk.GetStream(FileMode.Create, FileAccess.Write)
Using stringStream As StreamWriter = New StreamWriter(chunkStream)
stringStream.Write("YOUR HTML HERE")
End Using
End Using
altChunk = New AltChunk()
altChunk.Id = altChunkID
mainPart.Document.Body.InsertAt(Of AltChunk)(altChunk, 0)
mainPart.Document.Save()
End Using
c# code:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System.IO;
IO.MemoryStream ms;
MainDocumentPart mainPart;
Body b;
Document d;
AlternativeFormatImportPart chunk;
AltChunk altChunk;
string altChunkID = "AltChunkId1";
ms = new MemoryStream();
Using (myDoc = WordprocessingDocument.Create(ms, WordprocessingDocumentType.Document))
{
mainPart = myDoc.MainDocumentPart;
if (mainPart == null)
{
mainPart = myDoc.AddMainDocumentPart();
b = new Body();
d = new Document(b);
d.Save(mainPart);
}
chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Xhtml, altChunkID);
Using (Stream chunkStream = chunk.GetStream(FileMode.Create, FileAccess.Write)
{
Using (StreamWriter stringStream = new StreamWriter(chunkStream))
{
stringStream.Write("YOUR HTML HERE");
}
}
altChunk = new AltChunk();
altChunk.Id = altChunkID;
mainPart.Document.Body.InsertAt(Of, AltChunk)[altChunk, 0];
mainPart.Document.Save();
}
Note that I'm using the ms
memory stream in another routine, which is where it's disposed of after use.
I hope this helps someone else!
You cannot just insert the HTML content into a "document.xml", this part expects only a WordprocessingML content so you'll have to convert that HTML into WordprocessingML, see this .
Another thing that you could use is altChunk element, with it you would be able to place a HTML file inside your DOCX file and then reference that HTML content on some specific place inside your document, see this .
Last as an alternative, with GemBox.Document library you could accomplish exactly what you want, see the following:
public static void CreateDocument(string documentFileName, string text)
{
DocumentModel document = new DocumentModel();
document.Content.LoadText(text, LoadOptions.HtmlDefault);
document.Save(documentFileName);
}
Or you could actually straightforwardly convert a HTML content into a DOCX file:
public static void Convert(string documentFileName, string htmlText)
{
HtmlLoadOptions options = LoadOptions.HtmlDefault;
using (var htmlStream = new MemoryStream(options.Encoding.GetBytes(htmlText)))
DocumentModel.Load(htmlStream, options)
.Save(documentFileName);
}
I could successfully convert HTML content to docx file using OpenXML in an .net Core using this code
string html = "<strong>Hello</strong> World";
using (MemoryStream generatedDocument = new MemoryStream()){
using (WordprocessingDocument package =
WordprocessingDocument.Create(generatedDocument,
WordprocessingDocumentType.Document)){
MainDocumentPart mainPart = package.MainDocumentPart;
if (mainPart == null){
mainPart = package.AddMainDocumentPart();
new Document(new Body()).Save(mainPart);
}
HtmlConverter converter = new HtmlConverter(mainPart);
converter.ParseHtml(html);
mainPart.Document.Save();
}
To save on disk
System.IO.File.WriteAllBytes("filename.docx", generatedDocument.ToArray());
To return the file for download in net core mvc, use
return File(generatedDocument.ToArray(),
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"filename.docx");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.