简体   繁体   中英

Word OpenXml Word Found Unreadable Content

We are trying to manipulate a word document to remove a paragraph based on certain conditions. But the word file produced always ends up being corrupted when we try to open it with the error:

Word found unreadable content

The below code corrupts the file but if we remove the line:

Document document = mdp.Document;

The the file is saved and opens without issue. Is there an obvious issue that I am missing?

 var readAllBytes = File.ReadAllBytes(@"C:\Original.docx");


    using (var stream = new MemoryStream(readAllBytes))
    {
    using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true))
    {
        MainDocumentPart mdp = wpd.MainDocumentPart;
        Document document = mdp.Document;

    }
}

File.WriteAllBytes(@"C:\New.docx", readAllBytes);

UPDATE:

using (WordprocessingDocument wpd = WordprocessingDocument.Open(@"C:\Original.docx", true))
            {
                MainDocumentPart mdp = wpd.MainDocumentPart;
                Document document = mdp.Document;

                document.Save();
            }

Running the code above on a physical file we can still open Original.docx without the error so it seems limited to modifying a stream.

Here's a method that reads a document into a MemoryStream :

public static MemoryStream ReadAllBytesToMemoryStream(string path)
{
    byte[] buffer = File.ReadAllBytes(path);
    var destStream = new MemoryStream(buffer.Length);
    destStream.Write(buffer, 0, buffer.Length);
    destStream.Seek(0, SeekOrigin.Begin);
    return destStream;
}

Note how the MemoryStream is instantiated. I am passing the capacity rather than the buffer (as in your own code). Why is that?

When using MemoryStream() or MemoryStream(int) , you are creating a resizable MemoryStream instance, which you will want in case you make changes to your document. When using MemoryStream(byte[]) (as in your code), the MemoryStream instance is not resizable, which will be problematic unless you don't make any changes to your document or your changes will only ever make it shrink in size.

Now, to read a Word document into a MemoryStream , manipulate that Word document in memory, and end up with a consistent MemoryStream , you will have to do the following:

// Get a MemoryStream.
// In this example, the MemoryStream is created by reading a file stored
// in the file system. Depending on the Stream you "receive", it makes
// sense to copy the Stream to a MemoryStream before processing.
MemoryStream stream = ReadAllBytesToMemoryStream(@"C:\Original.docx");

// Open the Word document on the MemoryStream.
using (WordprocessingDocument wpd = WordprocessingDocument.Open(stream, true)
{
    MainDocumentPart mdp = wpd.MainDocumentPart;
    Document document = mdp.Document;
    // Manipulate document ...
}

// After having closed the WordprocessingDocument (by leaving the using statement),
// you can use the MemoryStream for whatever comes next, e.g., to write it to a
// file stored in the file system.
File.WriteAllBytes(@"C:\New.docx", stream.GetBuffer());

Note that you will have to reset the stream.Position property by calling stream.Seek(0, SeekOrigin.Begin) whenever your next action depends on that MemoryStream.Position property (eg, CopyTo , CopyToAsync ). Right after having left the using statement, the stream's position will be equal to its length.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM