简体   繁体   English

如何使用字节的 stream 形成 Word 文档

[英]How can I form a Word document using stream of bytes

I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table: I have a stream of bytes which actually (if put right) will form a valid Word file, I need to convert this stream into a Word file without writing it to disk, I take the original stream from SQL Server database table:

ID   Name    FileData
----------------------------------------
1    Word1   292jf2jf2ofm29fj29fj29fj29f2jf29efj29fj2f9 (actual file data)

the FileData field carries the data. FileData 字段携带数据。

Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document(); 
doc = word.Documents.Open(@"C:\SampleText.doc");
doc.Activate();

The above code opens and fill a Word file from File System, I don't want that, I want to define a new Microsoft.Office.Interop.Word.Document , but I want to fill its content manually from byte stream.上面的代码从文件系统打开并填充一个 Word 文件,我不想这样,我想定义一个新的Microsoft.Office.Interop.Word.Document ,但我想从字节 stream 手动填充它的内容。

After getting the in-memory Word document, I want to do some parsing of keywords.得到内存中的Word文档后,我想做一些关键字的解析。

Any ideas?有任何想法吗?

There are really only 2 ways to open a Word document programmatically - as a physical file or as a stream.实际上只有 2 种方式以编程方式打开 Word 文档 - 作为物理文件或作为 stream。 There's a "package", but that's not really applicable.有一个“包”,但这并不适用。

The stream method is covered here: https://docs.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream此处介绍了 stream 方法: https://docs.microsoft.com/en-us/office/open-xml/how-to-open-a-word-processing-document-from-a-stream

But even it relies on there being a physical file in order to form the stream:但即使它也依赖于物理文件来形成 stream:

string strDoc = @"C:\Users\Public\Public Documents\Word13.docx";
Stream stream = File.Open(strDoc, FileMode.Open);

The best solution I can offer would be to write the file out to a temp location where the service account for the application has permission to write:我可以提供的最佳解决方案是将文件写入应用程序的服务帐户有权写入的临时位置:

string newDocument = @"C:\temp\test.docx";
WriteFile(byteArray, newDocument);

If it didn't have permissions on the "temp" folder in my example, you would simply just add the service account of your application (application pool, if it's a website) to have Full Control of the folder.如果它在我的示例中对“temp”文件夹没有权限,您只需添加应用程序的服务帐户(应用程序池,如果它是一个网站)即可完全控制该文件夹。

You'd use this WriteFile() function:你会使用这个WriteFile() function:

/// <summary>
/// Write a byte[] to a new file at the location where you choose
/// </summary>
/// <param name="byteArray">byte[] that consists of file data</param>
/// <param name="newDocument">Path to where the new document will be written</param>
public static void WriteFile(byte[] byteArray, string newDocument)
{
    using (MemoryStream stream = new MemoryStream())
    {
        stream.Write(byteArray, 0, (int)byteArray.Length);

        // Save the file with the new name
        File.WriteAllBytes(newDocument, stream.ToArray());
    }
}

From there, you can open it with OpenXML and edit the file.从那里,您可以使用 OpenXML 打开它并编辑文件。 There's no way to open a Word document in byte[] form directly into an instance of Word - Interop, OpenXML, or otherwise - because you need a documentPath , or the stream method mentioned earlier that relies on there being a physical file.无法将 byte[] 形式的 Word 文档直接打开到 Word 的实例中 - Interop、OpenXML 或其他 - 因为您需要documentPath或前面提到的依赖于物理文件的 stream 方法。 You can edit the bytes you would get by reading the bytes into a string, and XML afterwards, or just edit the string, directly:您可以通过将字节读入字符串和 XML 来编辑您将获得的字节,或者直接编辑字符串:

string docText = null;
byte[] byteArray = null;
using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(documentPath, true))
{
    using (StreamReader sr = new StreamReader(wordDoc.MainDocumentPart.GetStream()))
    {
        docText = sr.ReadToEnd();  // <-- converts byte[] stream to string
    }

    // Play with the XML
    XmlDocument xml = new XmlDocument();
    xml.LoadXml(docText);  // the string contains the XML of the Word document

    XmlNodeList nodes = xml.GetElementsByTagName("w:body");
    XmlNode chiefBodyNode = nodes[0];
    // add paragraphs with AppendChild... 
    // remove a node by getting a ChildNode and removing it, like this...
    XmlNode firstParagraph = chiefBodyNode.ChildNodes[2];
    chiefBodyNode.RemoveChild(firstParagraph);

    // Or play with the string form
    docText = docText.Replace("John","Joe");

    // If you manipulated the XML, write it back to the string
    //docText = xml.OuterXml;  // comment out the line above if XML edits are all you want to do, and uncomment out this line

     // Save the file - yes, back to the file system - required
     using (StreamWriter sw = new StreamWriter(wordDoc.MainDocumentPart.GetStream(FileMode.Create)))
     {                    
        sw.Write(docText);
     }
 }

 // Read it back in as bytes
 byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

Reference:参考:

https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part https://docs.microsoft.com/en-us/office/open-xml/how-to-search-and-replace-text-in-a-document-part

I know it's not ideal, but I have searched and not found a way to edit the byte[] directly without a conversion that involves writing out the file, opening it in Word for the edits, then essentially re-uploading it to recover the new bytes.我知道这并不理想,但我已经搜索并没有找到一种直接编辑byte[]的方法,而无需进行涉及写出文件的转换,在 Word 中打开它进行编辑,然后基本上重新上传它以恢复新的字节。 Doing byte[] byteArray = Encoding.UTF8.GetBytes(docText);byte[] byteArray = Encoding.UTF8.GetBytes(docText); prior to re-reading the file will corrupt them, as would any other Encoding I tried ( UTF7 , Default , Unicode , ASCII ), as I found when I tried to write them back out using my WriteFile() function, above, in that last line.在重新读取文件之前会损坏它们,就像我尝试过的任何其他Encoding一样( UTF7DefaultUnicodeASCII ),正如我在尝试使用我的WriteFile() function 将它们写回时发现的那样最后一行。 When not encoded and simply collected using File.ReadAllBytes() , and then writing the bytes back out using WriteFile() , it worked fine.当不使用File.ReadAllBytes()进行编码和简单地收集,然后使用WriteFile()将字节写回时,它工作正常。

Update:更新:

It might be possible to manipulate the bytes like this:可以像这样操作字节:

//byte[] byteArray = File.ReadAllBytes("Test.docx"); // you might be able to assign your bytes here, instead of from a file?
byte[] byteArray = GetByteArrayFromDatabase(fileId); // function you have for getting the document from the database
using (MemoryStream mem = new MemoryStream())
{
    mem.Write(byteArray, 0, (int)byteArray.Length);
    using (WordprocessingDocument wordDoc =
            WordprocessingDocument.Open(mem, true))
    {
        // do your updates -- see string or XML edits, above

        // Once done, you may need to save the changes....
        //wordDoc.MainDocumentPart.Document.Save();
    }

    // But you will still need to save it to the file system here....
    // You would update "documentPath" to a new name first...
    string documentPath = @"C:\temp\newDoc.docx";
    using (FileStream fileStream = new FileStream(documentPath,
            System.IO.FileMode.CreateNew))
    {
        mem.WriteTo(fileStream);
    }
}

// And then read the bytes back in, to save it to the database
byteArray = File.ReadAllBytes(documentPath); // new bytes, ready for DB saving

Reference:参考:

https://docs.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12) https://docs.microsoft.com/en-us/previous-versions/office/office-12//ee945362(v=office.12)

But note that even this method will require saving the document, then reading it back in, in order to save it to bytes for the database.但请注意,即使这种方法也需要保存文档,然后将其读回,以便将其保存为数据库的字节。 It will also fail if the document is in .doc format instead of .docx on that line where the document is being opened.如果文档是.doc格式而不是在打开文档的那一行上的.docx ,它也会失败。

Instead of that last section for saving the file to the file system, you could just take the memory stream and save that back into bytes once you are outside of the WordprocessingDocument.Open() block, but still inside the using (MemoryStream mem = new MemoryStream() {... } statement:而不是将文件保存到文件系统的最后一部分,您可以只使用 memory stream 并将其保存回字节中,一旦您在WordprocessingDocument.Open()块之外,但仍在using (MemoryStream mem = new MemoryStream() {... }语句:

// Convert
byteArray = mem.ToArray();

This will have your Word document byte[] .这将有您的 Word 文档byte[]

  1. Create an in memmory file system, there are drivers for that.创建一个内存文件系统,有驱动程序。
  2. Give word a path to an ftp server path (or something else) which you then use to push the data.给 word 一个指向 ftp 服务器路径(或其他东西)的路径,然后您可以使用它来推送数据。

One important thing to note: storing files in a database is generally not good design.需要注意的一件重要事情是:将文件存储在数据库中通常不是好的设计。

There probably isn't any straight-forward way of doing this.可能没有任何直接的方法可以做到这一点。 I found a couple of solutions searching for it:我找到了几个解决方案来搜索它:

I don't know if this does it for you, but apparently the API doesn't provide what you're after (unfortunately).我不知道这是否适合您,但显然 API 并没有提供您所追求的(不幸的是)。

You could look at how Sharepoint solves this.你可以看看 Sharepoint 如何解决这个问题。 They have created a web interface for documents stored in their database.他们为存储在其数据库中的文档创建了一个 web 接口。

Its not that hard to create or embed a webserver in your application that can serve pages to Word.在您的应用程序中创建或嵌入可以为 Word 提供页面的 Web 服务器并不难。 You don't even have to use the standard ports.您甚至不必使用标准端口。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM