简体   繁体   English

在VBScript中解析Word文档

[英]Parse word document in VBScript

I got a weird mission from a friend, to parse through a bunch of Word files and write certain parts of them to a text file for further processing. 我从一个朋友那里得到了一个怪异的任务,那就是解析一堆Word文件并将它们的某些部分写入文本文件以进行进一步处理。

VBscript is not my cup of tea so I'm not sure how to fit the pieces together. VBscript不是我的杯水,所以我不确定如何将各个部分组合在一起。

The documents look like this: 这些文档如下所示:

Header
A lot of not interesting text
Table
Header
More boring text
Table

I want to parse the documents and get all the headers and table of contents out of it. 我想解析文档,并从中获取所有标题和目录。 I'm stepping step through the document with 我正在逐步浏览文档

For Each wPara In wd.ActiveDocument.Paragraphs

And I think I know how to get the headers 而且我想我知道如何获取标题

If Left(wPara.Range.Style, Len("Heading")) = "Heading" Then

But I'm unsure of how to do the 但我不确定该怎么做

Else if .. this paragraph belongs to a table..

So, any hint on how I could determine if a paragraph is part of a table or not would be nice. 因此,关于如何确定段落是否属于表的任何提示都很好。

Untested, because I have no access to MS Word right now. 未经测试,因为我现在无法访问MS Word。

Option Explicit

Dim FSO, Word, textfile, doc, para

' start Word instance, open doc ...
' start FileSystemObject instance, open textfile for output...

For Each para In doc.Paragraphs
    If IsHeading(para) Or IsInTable(para) Then 
        SaveToFile(textfile, para)
    End If
Next

Function IsHeading(para)
    IsHeading = para.OutlineLevel < 10
End Function

Function IsInTable(para)
    Dim p, dummy
    IsInTable = False

    Set p = para.Parent
    ' at some point p and p.Parent will both be the Word Application object
    Do While p Is Not p.Parent
        ' dirty check: if p is a table, calling a table object method will work
        On Error Resume Next
        Set dummy = obj.Cell(1, 1)
        If Err.Number = 0 Then
            IsInTable = True
            Exit Do
        Else 
            Err.Clear
        End If
        On Error GoTo 0

        Set p = p.Parent
    Loop
End Function

Obviously SaveToFile is something you'd implement yourself. 显然, SaveToFile是您自己实现的东西。


Since "is in table" is naturally defined as "the object's parent is a table", this is a perfect situation to use recursion (deconstructed a little further): 由于“在表中”很自然地被定义为“对象的父表是表”,因此使用递归是一个完美的情况(进一步解构):

Function IsInTable(para)
    IsInTable = IsTable(para.Parent)
    If Not (IsInTable Or para Is para.Parent) Then 
        IsInTable = IsInTable(para.Parent)
    End If
End Function

Function IsTable(obj)
    Dim dummy
    On Error Resume Next
    Set dummy = obj.Cell(1, 1)
    IsTable = (Err.Number = 0)
    Err.Clear
    On Error GoTo 0
End Function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM