简体   繁体   中英

How to split a Word document (docx or rtf) using a heading1 style as the split

I have a large file I have split into many separate rtf files with this slightly modified code I got online. The problem was I didn't want to include the Heading 1 text in the output file. However the Heading 1 data is used to create the filename of each output document.

This is the format of the file I am splitting to new files.

1.1.1 This would be marked Heading1 style

some text in here some text in here some text in here some text in here

1.1.2 This would be marked Heading1 style

some text in here some text in here some text in here some text in here

1.1.3 This would be marked Heading1 style

some text in here some text in here some text in here some text in here

=============================================================================== So what it outputs are files named 1.1.1.rtf, 1.1.2.rtf etc and would just contain the body text, but no heading.

repeats to end

Any guidance will be appreciated.

Sub aSplitOnHeadings()
'
' SplitOnHeadings Macro
'
'
Application.ScreenUpdating = False
Dim StrTmplt As String, StrPath As String, StrFlNm As String, Rng As Range, Doc As Document, i As Long, extension As String
extension = ".rtf" ' Jon added so we can have 1.1.1 for the references
With ActiveDocument
  StrTmplt = .AttachedTemplate.FullName
  StrPath = .Path & "\"
  With .Range
    With .Find
      .ClearFormatting
      .Replacement.ClearFormatting
      .Text = ""
      .Style = "Heading 1"
      .Replacement.Text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
      .MatchCase = False
      .MatchWholeWord = False
      .MatchWildcards = False
      .MatchSoundsLike = False
      .MatchAllWordForms = False
      .Execute
    End With


        Do While .Find.Found

          Set Rng = .Paragraphs(1).Range.Duplicate


                With Rng
                  StrFlNm = Replace(.Text, vbCr, "")

                  For i = 1 To 255 'I took out the chr 46 the full stop because it is legal 44 comma
                    Select Case i
                      Case 1 To 31, 33, 34, 37, 42, 47, 58 - 63, 91 - 93, 96, 124, 147, 148
                      StrFlNm = Replace(StrFlNm, Chr(i), "")
                    End Select
                  Next

                        Do

                        If .Paragraphs.Last.Range.End = ActiveDocument.Range.End Then Exit Do

                          Select Case .Paragraphs.Last.Next.Style

                            Case "Heading 1"
                            Selection.EndKey Unit:=wdLine
                              Exit Do
                            Case Else
                              .MoveEnd wdParagraph, 1
                            End Select
                        Loop

                End With

          Set Doc = Documents.Add(Template:=StrTmplt, Visible:=False)
          With Doc
            .Range.FormattedText = Rng.FormattedText
            .SaveAs2 FileName:=StrPath & StrFlNm & extension, Fileformat:=wdFormatRTF, AddToRecentFiles:=False
            .Close False
          End With
          .Collapse wdCollapseEnd
          .Find.Execute
        Loop
  End With
End With
Set Doc = Nothing: Set Rng = Nothing
Application.ScreenUpdating = True
'.SaveAs2 FileName:=StrPath & StrFlNm, FileFormat:=wdFormatRTF, AddToRecentFiles:=False
'.SaveAs2 FileName:=StrPath & StrFlNm, Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
End Sub

Try something based on:

Sub SplitDoc()
Application.ScreenUpdating = False
Dim Rng As Range, DocSrc As Document, DocTgt As Document
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*./\:?|"
Set DocSrc = ActiveDocument
With DocSrc.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Format = True
    .Forward = True
    .Text = ""
    .Style = wdStyleHeading1
    .Replacement.Text = ""
    .Wrap = wdFindStop
    .Execute
  End With
  Do While .Find.Found
    Set Rng = .Paragraphs(1).Range
    Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
    Set DocTgt = Documents.Add(DocSrc.AttachedTemplate)
    With DocTgt
      .Range.FormattedText = Rng.FormattedText
      StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
      ' Strip out illegal characters
      For i = 1 To Len(StrNoChr)
        StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
      Next
      .Paragraphs.First.Range.Delete
      .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & ".docx", FileFormat:=wdFormatXMLDocument, AddToRecentFiles:=False
      .Close False
    End With
    .Start = Rng.End
    .Find.Execute
  Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub

These two sets macro will work. Each one splits a document on Heading1 style into separate documents with the document named as the Heading1 it was split at, and the Heading1 is not included in the new document. That is just perfect. Here are the two sets of macro two for output in .rtf and two for docx Also in these macro I removed the . from being an illegal character as I did need the output to be as per the Heading1 exactly. Thanks macropod for taking the time to sort this. I will try to learn more about macros.

Jon.

Sub SplitDocOnHeading1ToRtfWithHeadingInOutput()
'Splits the document on Heading1 style, into new documents, Heading1 is  included in the data.



Application.ScreenUpdating = False
Dim Rng As Range, DocSrc As Document, DocTgt As Document
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
With DocSrc.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Format = True
    .Forward = True
    .Text = ""
    .Style = wdStyleHeading1
    .Replacement.Text = ""
    .Wrap = wdFindStop
    .Execute
  End With
  Do While .Find.Found
    Set Rng = .Paragraphs(1).Range
    Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
    Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
    With DocTgt
    Application.ScreenUpdating = False
      .Range.FormattedText = Rng.FormattedText
      StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
      ' Strip out illegal characters
      For i = 1 To Len(StrNoChr)
        StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
      Next
      '.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
      .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & ".rtf", Fileformat:=wdFormatRTF, AddToRecentFiles:=False
      .Close False
    End With
    .Start = Rng.End
    .Find.Execute
  Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub

Sub SplitDocOnHeading1ToRtfNoHeadingInOutput()
'Splits the document on Heading1 style, into new documents, Heading1 is NOT included in the data



Application.ScreenUpdating = False
Dim Rng As Range, DocSrc As Document, DocTgt As Document
Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
Set DocSrc = ActiveDocument
With DocSrc.Range
  With .Find
    .ClearFormatting
    .Replacement.ClearFormatting
    .Format = True
    .Forward = True
    .Text = ""
    .Style = wdStyleHeading1
    .Replacement.Text = ""
    .Wrap = wdFindStop
    .Execute
  End With
  Do While .Find.Found
    Set Rng = .Paragraphs(1).Range
    Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
    Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
    With DocTgt
    Application.ScreenUpdating = False
      .Range.FormattedText = Rng.FormattedText
      StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
      ' Strip out illegal characters
      For i = 1 To Len(StrNoChr)
        StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
      Next
      .Paragraphs.First.Range.Delete 'comment out this line if you want to retain headings in the output file
      .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & ".rtf", Fileformat:=wdFormatRTF, AddToRecentFiles:=False
      .Close False
    End With
    .Start = Rng.End
    .Find.Execute
  Loop
End With
Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
Application.ScreenUpdating = True
End Sub

    Sub SplitDocOnHeading1ToDocxWithHeadingInOutput()
    'Splits the document on Heading1 style, into new documents, Heading1 is  included in the data.



    Application.ScreenUpdating = False
    Dim Rng As Range, DocSrc As Document, DocTgt As Document
    Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
    Set DocSrc = ActiveDocument
    With DocSrc.Range
      With .Find
        .ClearFormatting
        .Replacement.ClearFormatting
        .Format = True
        .Forward = True
        .Text = ""
        .Style = wdStyleHeading1
        .Replacement.Text = ""
        .Wrap = wdFindStop
        .Execute
      End With
      Do While .Find.Found
        Set Rng = .Paragraphs(1).Range
        Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
        Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
        With DocTgt
        Application.ScreenUpdating = False
          .Range.FormattedText = Rng.FormattedText
          StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
          ' Strip out illegal characters
          For i = 1 To Len(StrNoChr)
            StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
          Next
          '.Paragraphs.First.Range.Delete 'un comment this line if you don't want to retain headings in the output file
          .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
          .Close False
        End With
        .Start = Rng.End
        .Find.Execute
      Loop
    End With
    Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
    Application.ScreenUpdating = True
    End Sub


Sub SplitDocOnHeading1ToDocxNoHeadingInOutput()
        'Splits the document on Heading1 style, into new documents, Heading1 is NOT included in the data



        Application.ScreenUpdating = False
        Dim Rng As Range, DocSrc As Document, DocTgt As Document
        Dim i As Long, StrTxt As String: Const StrNoChr As String = """*/\:?|"
        Set DocSrc = ActiveDocument
        With DocSrc.Range
          With .Find
            .ClearFormatting
            .Replacement.ClearFormatting
            .Format = True
            .Forward = True
            .Text = ""
            .Style = wdStyleHeading1
            .Replacement.Text = ""
            .Wrap = wdFindStop
            .Execute
          End With
          Do While .Find.Found
            Set Rng = .Paragraphs(1).Range
            Set Rng = Rng.GoTo(What:=wdGoToBookmark, Name:="\HeadingLevel")
            Set DocTgt = Documents.Add(DocSrc.AttachedTemplate.FullName)
            With DocTgt
            Application.ScreenUpdating = False
              .Range.FormattedText = Rng.FormattedText
              StrTxt = Split(.Paragraphs.First.Range.Text, vbCr)(0)
              ' Strip out illegal characters
              For i = 1 To Len(StrNoChr)
                StrTxt = Replace(StrTxt, Mid(StrNoChr, i, 1), "_")
              Next
              .Paragraphs.First.Range.Delete 'comment out this line if you want to retain headings in the output file
              .SaveAs2 FileName:=DocSrc.Path & "\" & StrTxt & ".docx", Fileformat:=wdFormatXMLDocument, AddToRecentFiles:=False
              .Close False
            End With
            .Start = Rng.End
            .Find.Execute
          Loop
        End With
        Set Rng = Nothing: Set DocSrc = Nothing: Set DocTgt = Nothing
        Application.ScreenUpdating = True
        End Sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM