使用 openxml 拆分 docx 后，Word 在 xxx.docx 中发现不可读的内容

Question

我有一个 full.docx，其中包括两个数学问题，docx 嵌入了一些图片和 MathType 方程（oleobject），我根据这个拆分了 doc，得到两个文件（first.docx，second.docx），first.docx 工作正常，但是，当我尝试打开它时，第二个.docx 会弹出一个警告对话框：

"Word found unreadable content in second.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes."

点击“是”后，可以打开文档，内容也正确，我想知道第二个.docx有什么问题？ 我已经用“Open xml sdk 2.5生产力工具”检查过，但没有找到原因。 非常感谢您的帮助。 谢谢。

这三个文件已经上传到这里。

显示一些代码：

        byte[] templateBytes = System.IO.File.ReadAllBytes(TEMPLATE_YANG_FILE);
        using (MemoryStream templateStream = new MemoryStream())
        {
            templateStream.Write(templateBytes, 0, (int)templateBytes.Length);

            string guidStr = Guid.NewGuid().ToString();

            using (WordprocessingDocument document = WordprocessingDocument.Open(templateStream, true))
            {
                document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);

                MainDocumentPart mainPart = document.MainDocumentPart;

                mainPart.Document = new Document();
                Body bd = new Body();

                foreach (DocumentFormat.OpenXml.Wordprocessing.Paragraph clonedParagrph in lst)
                {
                    bd.AppendChild<DocumentFormat.OpenXml.Wordprocessing.Paragraph>(clonedParagrph);

                    clonedParagrph.Descendants<Blip>().ToList().ForEach(blip =>
                    {
                        var newRelation = document.CopyImage(blip.Embed, this.wordDocument);
                        blip.Embed = newRelation;
                    });

                    clonedParagrph.Descendants<DocumentFormat.OpenXml.Vml.ImageData>().ToList().ForEach(imageData =>
                    {
                        var newRelation = document.CopyImage(imageData.RelationshipId, this.wordDocument);
                        imageData.RelationshipId = newRelation;
                    });
                }

                mainPart.Document.Body = bd;
                mainPart.Document.Save();
            }

            string subDocFile = System.IO.Path.Combine(this.outDir, guidStr + ".docx");
            this.subWordFileLst.Add(subDocFile);

            File.WriteAllBytes(subDocFile, templateStream.ToArray());
        }

lst 包含从原始 docx 克隆的段落，使用：

(DocumentFormat.OpenXml.Wordprocessing.Paragraph)p.Clone();

Answer 1

使用生产力工具，发现oleobjectx.bin没有复制，所以我在复制Blip和ImageData后添加以下代码：

clonedParagrph.Descendants<OleObject>().ToList().ForEach(ole =>
{
    var newRelation = document.CopyOleObject(ole.Id, this.wordDocument);
    ole.Id = newRelation;
});

解决了这个问题。

使用 openxml 拆分 docx 后，Word 在 xxx.docx 中发现不可读的内容

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-02-13 14:23:26

使用 openxml 拆分 docx 后，Word 在 xxx.docx 中发现不可读的内容

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-02-13 14:23:26

解决方案1
0 已采纳 2020-02-13 14:23:26