[英]Parse HTML text not in a tag using VBA
我需要獲取一些文本,除了body標記中沒有包含在任何HTML元素之外,但是問題是該文本被其他標記分解,需要放入單獨的單元格中。
例如:
<a id="00:00:00" class="ts">[00:00:00]</a> <font class="mn">Name1</font> First bit of text<br/>
<a id="00:00:09" class="ts">[00:00:09]</a> <font class="mn">Name2</font> Second Line of Text<br/>
<a id="00:01:17" class="ts">[00:01:17]</a> <font class="mn">Name3</font> A third line of text<br/>
<a id="00:01:59" class="ts">[00:01:59]</a> <font class="mn">Name4</font> The final line of text<br/>
我能夠將時間戳和名稱輸入相應的列中,但是我在弄清楚如何將每一行文本放入相應的行中時遇到了麻煩。
到目前為止,這是我的代碼:
Dim i As Integer
Dim Timestamp As Object
Dim Name As Object
my_url = "path_to_url.html"
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML2.XMLHTTP")
xml_obj.Open "GET", my_url, False
xml_obj.send
html_doc.body.innerHTML = xml_obj.responseText
Set xml_obj = Nothing
Set Timestamp = html_doc.body.getElementsByTagName("a")
Set Name = html_doc.body.getElementsByTagName("font")
i = 2
For Each itm In Timestamp
If itm.getAttribute("className") = "ts" Then
Cells(i, 1).Value = itm.innerText
i = i + 1
End If
Next
i = 2
For Each itm In Name
If itm.getAttribute("className") = "mn" Then
Cells(i, 2).Value = itm.innerText
i = i + 1
End If
Next
我當時正在考慮也許以某種方式使用<br/>
並使用LEFT,但是我不確定這是否是最好的方法。 提前致謝。
只要這是響應中唯一的內容,並且沒有其他部分,您可以執行以下操作:
編輯:修改以拆分除
Sub Tester()
Const RW_START As Long = 5
Const SPLITTER = "{xxxx}"
Dim i As Integer, html_doc, itm
Dim Timestamp As Object
Dim Name As Object
Dim arr, sep, txt
Set html_doc = CreateObject("htmlfile")
html_doc.body.innerHTML = Range("A1").Value 'for my testing...
Set Timestamp = html_doc.body.getElementsByTagName("a")
Set Name = html_doc.body.getElementsByTagName("font")
i = RW_START
For Each itm In Timestamp
If itm.getAttribute("className") = "ts" Then
Cells(i, 1).Value = itm.innerText
itm.innerText = "" '<<<
i = i + 1
End If
Next
i = RW_START
For Each itm In Name
If itm.getAttribute("className") = "mn" Then
Cells(i, 2).Value = itm.innerText
itm.innerText = IIf(i = RW_START, "", SPLITTER) '<<<
i = i + 1
End If
Next
'get the remaining text and split on newline (<br>)
arr = Split(html_doc.body.innerText, SPLITTER)
i = RW_START
For Each itm In arr
itm = Trim(itm)
'remove trailing vbLf
If Right(itm, 1) = vblf Then itm = Left(itm, Len(itm)-1)
Cells(i, 3).Value = Trim(itm)
i = i + 1
Next
End Sub
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.