简体   繁体   English

VBA Excel-Word文档中的定界和解析部分,以便将数据输入Excel

[英]VBA Excel - Delimit and Parse sections in Word Document in-order to input data into Excel

I am trying to figure out a way using VBA to parse a Word Document so I can put its contents into an array. 我试图找出一种使用VBA解析Word文档的方法,以便可以将其内容放入数组中。 For this example I have two companies in a Word Document (as seen below the code) and I want to put the fields into an array. 对于此示例,我在Word文档中有两家公司(如代码下方所示),并且我想将这些字段放入数组中。

Public Sub ParseCompanies()

Dim Company_Array(1 To 2) As String 'stores individual company fields
Dim Companies_Array() 'array for all companies

Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")

Set oDoc = oWord.Documents.Open("C:/Temp/test.docx", Visible:=True)

Dim singleLine
Dim lineText As String

'need to rewrite this section
For Each singleLine In oDoc.Paragraphs

   lineText = singleLine.Range.Text
   Debug.Print lineText    

Next singleLine


End Sub

Word file contents cut and pasted onto Stack Overflow: 将Word文件内容剪切并粘贴到Stack Overflow:


Company: Aladin Carpets 公司名称:Aladin Carpets

Product: Magic Carpets 产品:魔术地毯


Company: Aerials Seashells 公司:天线贝壳

Product: Seashells 产品:贝壳


The way the current script runs can be seen below in the VBA Debugger Output 当前脚本的运行方式可以在下面的VBA调试器输出中看到

在此处输入图片说明

Is there an efficient way to do this? 有一种有效的方法可以做到这一点吗? A way to delimit the lines or section splitters in the word document in order to parse the individual companies? 一种方法来分隔单词文档中的行或节分隔符以解析各个公司?

Solution: 解:
If the output is as stated -I copied your data but I get different results-, this should work, if not, just adjust the element being saved in the array: 如果输出如前所述-我复制了您的数据但得到了不同的结果-,这应该可以工作,如果没有,只需调整要保存在数组中的元素即可:

Public Sub ParseCompanies()

Dim Products_Array() As String 'stores individual company fields
Dim Companies_Array() As String 'array for all companies
Dim CounterElements As Long: CounterElements = 1
Dim CounterParagraphs As Long

Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")
On Error GoTo Err01ParseCompanies
Set oDoc = oWord.Documents.Open("C:\Users\lz630z\Desktop\Company.docx", visible:=True)

Dim singleLine
Dim lineText As String

'need to rewrite this section
For CounterParagraphs = 1 To oDoc.Paragraphs.Count
   If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Company") Then ReDim Preserve Companies_Array(CounterElements): Companies_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2)
   If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Product") Then ReDim Preserve Products_Array(CounterElements): Products_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2): CounterElements = CounterElements + 1

Next CounterParagraphs

If 1 = 2 Then ' 99. If error
Err01ParseCompanies:
MsgBox "Word Error", vbCritical
End If '99. If error
Set oDoc = Nothing
Set oWord = Nothing
End Sub

Summary of changes/suggestions 变更/建议摘要
For each won't work here, since according to the screenshot is going to be 2 rows after it found the first result, it's better to have everything controlled in this scenario and save the elements in the array accordingly, changed a For/To approach (I assumed you arrays meant to be as defined now). For each都不起作用的地方,因为根据屏幕快照,在找到第一个结果之后将是两行,因此最好在这种情况下控制所有内容并相应地将元素保存在数组中,更改为For/To方法(我假设您要按现在定义的那样数组)。 Whenever you are referring to one, for the size the other will be accordingly. 每当您指的是一个尺寸时,另一个尺寸都会相应地使用。
IG: Companies_Array(1) will be Aladin Carpets and Products_Array(1) will be Magic Carpets IG:Companies_Array(1)将成为阿拉丁地毯,而Products_Array(1)将成为魔术地毯

If your word document items are truly paragraph delimited, you could use the Split method to fill your array and then loop through it to manipulate the data. 如果您的Word文档项是真正由段落分隔的,则可以使用Split方法填充数组,然后遍历数组以处理数据。 For example, this just fills the array and prints the elements to the immediate window: 例如,这只是填充数组并将元素打印到立即窗口:

Public Sub ParseCompanies()
    Dim wordList() As String
    Dim i As Long
    Dim oWord As Word.Application
    Dim oDoc As Word.Document

    Set oWord = CreateObject("Word.Application")
    Set oDoc = oWord.Documents.Open("C:\Users\test\Desktop\Company.docx", Visible:=False)

    wordList = Split(oDoc.Content.Text, vbCr) 'split using carriage return (paragraphs)

    For i = 0 To UBound(wordList, 1)
        Debug.Print wordList(i)
    Next i

    oWord.Quit
End Sub

I can't speak to the performance of this method on a large file, so it may require testing before this can be considered a viable option. 我不能说这种方法在大文件上的性能,因此在被认为是可行的选择之前,可能需要进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM