[英]VBA Excel - Delimit and Parse sections in Word Document in-order to input data into Excel
I am trying to figure out a way using VBA to parse a Word Document so I can put its contents into an array. 我试图找出一种使用VBA解析Word文档的方法,以便可以将其内容放入数组中。 For this example I have two companies in a Word Document (as seen below the code) and I want to put the fields into an array.
对于此示例,我在Word文档中有两家公司(如代码下方所示),并且我想将这些字段放入数组中。
Public Sub ParseCompanies()
Dim Company_Array(1 To 2) As String 'stores individual company fields
Dim Companies_Array() 'array for all companies
Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")
Set oDoc = oWord.Documents.Open("C:/Temp/test.docx", Visible:=True)
Dim singleLine
Dim lineText As String
'need to rewrite this section
For Each singleLine In oDoc.Paragraphs
lineText = singleLine.Range.Text
Debug.Print lineText
Next singleLine
End Sub
Word file contents cut and pasted onto Stack Overflow: 将Word文件内容剪切并粘贴到Stack Overflow:
Company: Aladin Carpets
公司名称:Aladin Carpets
Product: Magic Carpets
产品:魔术地毯
Company: Aerials Seashells
公司:天线贝壳
Product: Seashells
产品:贝壳
The way the current script runs can be seen below in the VBA Debugger Output 当前脚本的运行方式可以在下面的VBA调试器输出中看到
Is there an efficient way to do this? 有一种有效的方法可以做到这一点吗? A way to delimit the lines or section splitters in the word document in order to parse the individual companies?
一种方法来分隔单词文档中的行或节分隔符以解析各个公司?
Solution: 解:
If the output is as stated -I copied your data but I get different results-, this should work, if not, just adjust the element being saved in the array: 如果输出如前所述-我复制了您的数据但得到了不同的结果-,这应该可以工作,如果没有,只需调整要保存在数组中的元素即可:
Public Sub ParseCompanies()
Dim Products_Array() As String 'stores individual company fields
Dim Companies_Array() As String 'array for all companies
Dim CounterElements As Long: CounterElements = 1
Dim CounterParagraphs As Long
Dim oWord As Object, oDoc As Object
Set oWord = CreateObject("Word.Application")
On Error GoTo Err01ParseCompanies
Set oDoc = oWord.Documents.Open("C:\Users\lz630z\Desktop\Company.docx", visible:=True)
Dim singleLine
Dim lineText As String
'need to rewrite this section
For CounterParagraphs = 1 To oDoc.Paragraphs.Count
If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Company") Then ReDim Preserve Companies_Array(CounterElements): Companies_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2)
If InStr(oDoc.Paragraphs(CounterParagraphs).Range.Text, "Product") Then ReDim Preserve Products_Array(CounterElements): Products_Array(CounterElements) = oDoc.Paragraphs(CounterParagraphs + 2): CounterElements = CounterElements + 1
Next CounterParagraphs
If 1 = 2 Then ' 99. If error
Err01ParseCompanies:
MsgBox "Word Error", vbCritical
End If '99. If error
Set oDoc = Nothing
Set oWord = Nothing
End Sub
Summary of changes/suggestions 变更/建议摘要
For each
won't work here, since according to the screenshot is going to be 2 rows after it found the first result, it's better to have everything controlled in this scenario and save the elements in the array accordingly, changed a For/To
approach (I assumed you arrays meant to be as defined now). For each
都不起作用的地方,因为根据屏幕快照,在找到第一个结果之后将是两行,因此最好在这种情况下控制所有内容并相应地将元素保存在数组中,更改为For/To
方法(我假设您要按现在定义的那样数组)。 Whenever you are referring to one, for the size the other will be accordingly. 每当您指的是一个尺寸时,另一个尺寸都会相应地使用。
IG: Companies_Array(1) will be Aladin Carpets and Products_Array(1) will be Magic Carpets IG:Companies_Array(1)将成为阿拉丁地毯,而Products_Array(1)将成为魔术地毯
If your word document items are truly paragraph delimited, you could use the Split
method to fill your array and then loop through it to manipulate the data. 如果您的Word文档项是真正由段落分隔的,则可以使用
Split
方法填充数组,然后遍历数组以处理数据。 For example, this just fills the array and prints the elements to the immediate window: 例如,这只是填充数组并将元素打印到立即窗口:
Public Sub ParseCompanies()
Dim wordList() As String
Dim i As Long
Dim oWord As Word.Application
Dim oDoc As Word.Document
Set oWord = CreateObject("Word.Application")
Set oDoc = oWord.Documents.Open("C:\Users\test\Desktop\Company.docx", Visible:=False)
wordList = Split(oDoc.Content.Text, vbCr) 'split using carriage return (paragraphs)
For i = 0 To UBound(wordList, 1)
Debug.Print wordList(i)
Next i
oWord.Quit
End Sub
I can't speak to the performance of this method on a large file, so it may require testing before this can be considered a viable option. 我不能说这种方法在大文件上的性能,因此在被认为是可行的选择之前,可能需要进行测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.