克隆 Office Open XML 文档的最有效方法是什么?

[英]What is the most efficient way to clone Office Open XML documents?

使用 Office Open XML 文档(例如,自 Office 2007 发布以来由 Word、Excel 或 PowerPoint 创建的文档)时,您通常希望克隆或复制现有文档,然后对该克隆文档进行更改,从而创建一个新文档文档。

在这种情况下已经提出并回答了几个问题(有时是错误的或至少不是最佳的),表明用户确实面临问题。 例如:


  1. 正确克隆或复制这些文档的可能方法有哪些?
  2. 哪种方式效率最高?

以下示例类显示了多种正确复制几乎所有文件并将副本返回到MemoryStreamFileStream的方法,您可以从中打开WordprocessingDocument (Word)、 SpreadsheetDocument (Excel) 或PresentationDocument (PowerPoint) 并进行任何更改,使用Open XML SDK和可选的Open-XML-PowerTools

using System.IO;

namespace CodeSnippets.IO
    /// <summary>
    /// This class demonstrates multiple ways to clone files stored in the file system.
    /// In all cases, the source file is stored in the file system. Where the return type
    /// is a <see cref="MemoryStream"/>, the destination file will be stored only on that
    /// <see cref="MemoryStream"/>. Where the return type is a <see cref="FileStream"/>,
    /// the destination file will be stored in the file system and opened on that
    /// <see cref="FileStream"/>.
    /// </summary>
    /// <remarks>
    /// The contents of the <see cref="MemoryStream"/> instances returned by the sample
    /// methods can be written to a file as follows:
    ///     var stream = ReadAllBytesToMemoryStream(sourcePath);
    ///     File.WriteAllBytes(destPath, stream.GetBuffer());
    /// You can use <see cref="MemoryStream.GetBuffer"/> in cases where the MemoryStream
    /// was created using <see cref="MemoryStream()"/> or <see cref="MemoryStream(int)"/>.
    /// In other cases, you can use the <see cref="MemoryStream.ToArray"/> method, which
    /// copies the internal buffer to a new byte array. Thus, GetBuffer() should be a tad
    /// faster.
    /// </remarks>
    public static class FileCloner
        public static MemoryStream ReadAllBytesToMemoryStream(string path)
            byte[] buffer = File.ReadAllBytes(path);
            var destStream = new MemoryStream(buffer.Length);
            destStream.Write(buffer, 0, buffer.Length);
            destStream.Seek(0, SeekOrigin.Begin);
            return destStream;

        public static MemoryStream CopyFileStreamToMemoryStream(string path)
            using FileStream sourceStream = File.OpenRead(path);
            var destStream = new MemoryStream((int) sourceStream.Length);
            destStream.Seek(0, SeekOrigin.Begin);
            return destStream;

        public static FileStream CopyFileStreamToFileStream(string sourcePath, string destPath)
            using FileStream sourceStream = File.OpenRead(sourcePath);
            FileStream destStream = File.Create(destPath);
            destStream.Seek(0, SeekOrigin.Begin);
            return destStream;

        public static FileStream CopyFileAndOpenFileStream(string sourcePath, string destPath)
            File.Copy(sourcePath, destPath, true);
            return new FileStream(destPath, FileMode.Open, FileAccess.ReadWrite, FileShare.None);

除了上述与 Open XML 无关的方法之外,您还可以使用以下方法,例如,如果您已经打开了一个OpenXmlPackage ,例如WordprocessingDocumentSpreadsheetDocumentPresentationDocument

public void DoWorkCloningOpenXmlPackage()
    using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false);

    // There are multiple overloads of the Clone() method in the Open XML SDK.
    // This one clones the source document to the given destination path and
    // opens it in read-write mode.
    using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true);


上述所有方法都可以正确地克隆或复制文档。 但是什么是最有效的呢?

输入我们的基准测试,它使用BenchmarkDotNet NuGet 包:

using System;
using System.Collections.Generic;
using System.Diagnostics.CodeAnalysis;
using System.IO;
using System.Linq;
using BenchmarkDotNet.Attributes;
using CodeSnippets.IO;
using CodeSnippets.OpenXml.Wordprocessing;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;

namespace CodeSnippets.Benchmarks.IO
    public class FileClonerBenchmark
        #region Setup and Helpers

        private const string SourcePath = "Source.docx";
        private const string DestPath = "Destination.docx";

        [Params(1, 10, 100, 1000)]
        public static int ParagraphCount;

        public void GlobalSetup()

        private static void CreateTestDocument(string path)
            const string sentence = "The quick brown fox jumps over the lazy dog.";
            string text = string.Join(" ", Enumerable.Range(0, 22).Select(i => sentence));
            IEnumerable<string> texts = Enumerable.Range(0, ParagraphCount).Select(i => text);
            using WordprocessingDocument unused = WordprocessingDocumentFactory.Create(path, texts);

        private static void ChangeWordprocessingDocument(WordprocessingDocument wordDocument)
            Body body = wordDocument.MainDocumentPart.Document.Body;
            Text text = body.Descendants<Text>().First();
            text.Text = DateTimeOffset.UtcNow.Ticks.ToString();


        #region Benchmarks

        [Benchmark(Baseline = true)]
        public void DoWorkUsingReadAllBytesToMemoryStream()
            using MemoryStream destStream = FileCloner.ReadAllBytesToMemoryStream(SourcePath);

            using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true))

            File.WriteAllBytes(DestPath, destStream.GetBuffer());

        public void DoWorkUsingCopyFileStreamToMemoryStream()
            using MemoryStream destStream = FileCloner.CopyFileStreamToMemoryStream(SourcePath);

            using (WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true))

            File.WriteAllBytes(DestPath, destStream.GetBuffer());

        public void DoWorkUsingCopyFileStreamToFileStream()
            using FileStream destStream = FileCloner.CopyFileStreamToFileStream(SourcePath, DestPath);
            using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true);

        public void DoWorkUsingCopyFileAndOpenFileStream()
            using FileStream destStream = FileCloner.CopyFileAndOpenFileStream(SourcePath, DestPath);
            using WordprocessingDocument wordDocument = WordprocessingDocument.Open(destStream, true);

        public void DoWorkCloningOpenXmlPackage()
            using WordprocessingDocument sourceWordDocument = WordprocessingDocument.Open(SourcePath, false);
            using var wordDocument = (WordprocessingDocument) sourceWordDocument.Clone(DestPath, true);



using BenchmarkDotNet.Running;
using CodeSnippets.Benchmarks.IO;

namespace CodeSnippets.Benchmarks
    public static class Program
        public static void Main()

我的机器上的结果是什么? 哪种方法最快?

BenchmarkDotNet=v0.12.0, OS=Windows 10.0.18362
Intel Core i7-7500U CPU 2.70GHz (Kaby Lake), 1 CPU, 4 logical and 2 physical cores
.NET Core SDK=3.0.100
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  DefaultJob : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
| Method                                  | ParaCount |      Mean |     Error |    StdDev |    Median | Ratio |
| --------------------------------------- | --------- | --------: | --------: | --------: | --------: | ----: |
| DoWorkUsingReadAllBytesToMemoryStream   | 1         |  1.548 ms | 0.0298 ms | 0.0279 ms |  1.540 ms |  1.00 |
| DoWorkUsingCopyFileStreamToMemoryStream | 1         |  1.561 ms | 0.0305 ms | 0.0271 ms |  1.556 ms |  1.01 |
| DoWorkUsingCopyFileStreamToFileStream   | 1         |  2.394 ms | 0.0601 ms | 0.1100 ms |  2.354 ms |  1.55 |
| DoWorkUsingCopyFileAndOpenFileStream    | 1         |  3.302 ms | 0.0657 ms | 0.0855 ms |  3.312 ms |  2.12 |
| DoWorkCloningOpenXmlPackage             | 1         |  4.567 ms | 0.1218 ms | 0.3591 ms |  4.557 ms |  3.13 |
|                                         |           |           |           |           |           |       |
| DoWorkUsingReadAllBytesToMemoryStream   | 10        |  1.737 ms | 0.0337 ms | 0.0361 ms |  1.742 ms |  1.00 |
| DoWorkUsingCopyFileStreamToMemoryStream | 10        |  1.752 ms | 0.0347 ms | 0.0571 ms |  1.739 ms |  1.01 |
| DoWorkUsingCopyFileStreamToFileStream   | 10        |  2.505 ms | 0.0390 ms | 0.0326 ms |  2.500 ms |  1.44 |
| DoWorkUsingCopyFileAndOpenFileStream    | 10        |  3.532 ms | 0.0731 ms | 0.1860 ms |  3.455 ms |  2.05 |
| DoWorkCloningOpenXmlPackage             | 10        |  4.446 ms | 0.0880 ms | 0.1470 ms |  4.424 ms |  2.56 |
|                                         |           |           |           |           |           |       |
| DoWorkUsingReadAllBytesToMemoryStream   | 100       |  2.847 ms | 0.0563 ms | 0.0553 ms |  2.857 ms |  1.00 |
| DoWorkUsingCopyFileStreamToMemoryStream | 100       |  2.865 ms | 0.0561 ms | 0.0786 ms |  2.868 ms |  1.02 |
| DoWorkUsingCopyFileStreamToFileStream   | 100       |  3.550 ms | 0.0697 ms | 0.0881 ms |  3.570 ms |  1.25 |
| DoWorkUsingCopyFileAndOpenFileStream    | 100       |  4.456 ms | 0.0877 ms | 0.0861 ms |  4.458 ms |  1.57 |
| DoWorkCloningOpenXmlPackage             | 100       |  5.958 ms | 0.1242 ms | 0.2727 ms |  5.908 ms |  2.10 |
|                                         |           |           |           |           |           |       |
| DoWorkUsingReadAllBytesToMemoryStream   | 1000      | 12.378 ms | 0.2453 ms | 0.2519 ms | 12.442 ms |  1.00 |
| DoWorkUsingCopyFileStreamToMemoryStream | 1000      | 12.538 ms | 0.2070 ms | 0.1835 ms | 12.559 ms |  1.02 |
| DoWorkUsingCopyFileStreamToFileStream   | 1000      | 12.919 ms | 0.2457 ms | 0.2298 ms | 12.939 ms |  1.05 |
| DoWorkUsingCopyFileAndOpenFileStream    | 1000      | 13.728 ms | 0.2803 ms | 0.5196 ms | 13.652 ms |  1.11 |
| DoWorkCloningOpenXmlPackage             | 1000      | 16.868 ms | 0.2174 ms | 0.1927 ms | 16.801 ms |  1.37 |

事实证明, DoWorkUsingReadAllBytesToMemoryStream()始终是最快的方法。 但是, DoWorkUsingCopyFileStreamToMemoryStream()的余量很容易出现误差。 这意味着您应该尽可能在MemoryStream上打开 Open XML 文档以进行处理。 如果您不必将生成的文档存储在文件系统中,这甚至比不必要地使用FileStream快得多。

在涉及输出FileStream的任何地方,您都会看到更“显着”的差异(请注意,如果您处理大量文档,毫秒可能会有所不同)。 你应该注意到使用File.Copy()实际上并不是一个好的方法。

最后,事实证明,使用OpenXmlPackage.Clone()方法或其覆盖之一是最慢的方法。 这是因为它涉及比复制字节更复杂的逻辑。 但是,如果您得到的只是对OpenXmlPackage (或其子类之一)的引用,则Clone()方法及其覆盖是您的最佳选择。

您可以在我的CodeSnippets GitHub 存储库中找到完整的源代码。 查看CodeSnippets.Benchmark项目和FileCloner类。


