简体   繁体   中英

A quick method to remove a huge number of bookmarks from a Word document

I'm looking for fast code to delete bookmarks from docx file opened in MS Word.

Now, I am using simple VBA macro with some experience-based improvement.

Public Sub RemoveBookmarks(ByRef doc As Document)
    Dim b As Bookmark
    Dim i As Long
    For Each b In doc.Bookmarks
        b.Delete
        'There were documents freeze Word after delete only 4 bookmarks
        i = i + 1
        If i Mod 4 = 0 Then
            doc.UndoClear
        End If
        'to handle possible Ctrl+Break
        If i Mod 100 = 0 Then
            DoEvents
        End If
    Next b
    Set b = Nothing
End Sub

Very often my colleagues have large documents (over 1,2k pages) with 25k and more bookmarks. Delete this bookmarks take a lot of time.

Delete bookmarks using DocumentOpenXml and manipulate WordProcessingDocument is very fast:

public static void RemoveAllBookmarks(string fileName)
{
    using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(fileName, true))
    {
        var d = wordDoc.MainDocumentPart.Document;
        var bstart = d.Descendants<BookmarkStart>().ToList();
        foreach (var s in bstart)
            s.Remove();

        var bend = d.Descendants<BookmarkEnd>().ToList();
        foreach (var e in bend)
            e.Remove();
        
        d.Save();
        wordDoc.Save();
    }
}

but I want to avoid close and open document in Word again, because adding and removing bookmarks is a part of larger proces. I don't want (I don't think I can) predict whether the document preparation process will be faster: just delete using VBA or close, remove and open huge file several times.

Maybe there is the solution to manipulate WordprocessingDocument underneath opened document and insert xml.

Whenever you delete items from a collection you need to start from the end and work backwards.

Public Sub RemoveBookmarks(ByRef doc As Document)
    Dim b As Long
    For b = doc.Bookmarks.Count To 1 Step -1
        doc.Bookmarks(b).Delete
    Next b
End Sub

If you want to transform the already opened Word document, you can use the WordOpenXML property of a Document or Range instance to get the document's or range's Open XML markup in the Flat OPC format. You can then work with that XML string in the following ways:

  • Using the DocumentFormat.OpenXml NuGet package (Open XML SDK), you can turn the Flat OPC string into a WordprocessingDocument and transform it as you described.
  • Using LINQ to XML ( System.Xml.Linq ) and the DocumentFormat.OpenXml.Linq NuGet package, you can turn it into an XElement and transform it without having to turn it into a WordprocessingDocument

Once you have transformed the markup, you can turn it back into a Flat OPC string and insert it back into the Word document, using the Range.InsertXML() method.

Transforming Open XML markup is one or two orders of magnitude faster than using the COM APIs, if you need many COM calls to create the desired result. Note, though, that retrieving the Open XML markup from an opened document and inserting it back into the document is also not free.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM