簡體   English   中英

使用VBA解析不在標記中的HTML文本

[英]Parse HTML text not in a tag using VBA

我需要獲取一些文本,除了body標記中沒有包含在任何HTML元素之外,但是問題是該文本被其他標記分解,需要放入單獨的單元格中。

例如:

<a id="00:00:00" class="ts">[00:00:00]</a> <font class="mn">Name1</font> First bit of text<br/>
<a id="00:00:09" class="ts">[00:00:09]</a> <font class="mn">Name2</font> Second Line of Text<br/>
<a id="00:01:17" class="ts">[00:01:17]</a> <font class="mn">Name3</font> A third line of text<br/>
<a id="00:01:59" class="ts">[00:01:59]</a> <font class="mn">Name4</font> The final line of text<br/>

我能夠將時間戳和名稱輸入相應的列中,但是我在弄清楚如何將每一行文本放入相應的行中時遇到了麻煩。

到目前為止,這是我的代碼:

Dim i As Integer
Dim Timestamp As Object
Dim Name As Object

my_url = "path_to_url.html"
Set html_doc = CreateObject("htmlfile")
Set xml_obj = CreateObject("MSXML2.XMLHTTP")

xml_obj.Open "GET", my_url, False
xml_obj.send
html_doc.body.innerHTML = xml_obj.responseText
Set xml_obj = Nothing

Set Timestamp = html_doc.body.getElementsByTagName("a")
Set Name = html_doc.body.getElementsByTagName("font")

i = 2
For Each itm In Timestamp
    If itm.getAttribute("className") = "ts" Then
        Cells(i, 1).Value = itm.innerText
        i = i + 1
    End If
Next

i = 2
For Each itm In Name
    If itm.getAttribute("className") = "mn" Then
        Cells(i, 2).Value = itm.innerText
        i = i + 1
    End If
Next

我當時正在考慮也許以某種方式使用<br/>並使用LEFT,但是我不確定這是否是最好的方法。 提前致謝。

只要這是響應中唯一的內容,並且沒有其他部分,您可以執行以下操作:

編輯:修改以拆分除

Sub Tester()

    Const RW_START As Long = 5
    Const SPLITTER = "{xxxx}"
    Dim i As Integer, html_doc, itm
    Dim Timestamp As Object
    Dim Name As Object
    Dim arr, sep, txt

    Set html_doc = CreateObject("htmlfile")
    html_doc.body.innerHTML = Range("A1").Value 'for my testing...


    Set Timestamp = html_doc.body.getElementsByTagName("a")
    Set Name = html_doc.body.getElementsByTagName("font")

    i = RW_START
    For Each itm In Timestamp
        If itm.getAttribute("className") = "ts" Then
            Cells(i, 1).Value = itm.innerText
            itm.innerText = "" '<<<
            i = i + 1
        End If
    Next

    i = RW_START
    For Each itm In Name
        If itm.getAttribute("className") = "mn" Then
            Cells(i, 2).Value = itm.innerText
            itm.innerText = IIf(i = RW_START, "", SPLITTER) '<<<
            i = i + 1
        End If
    Next

    'get the remaining text and split on newline (<br>)
    arr = Split(html_doc.body.innerText, SPLITTER)
    i = RW_START
    For Each itm In arr
        itm = Trim(itm) 
        'remove trailing vbLf
        If Right(itm, 1) = vblf Then itm = Left(itm, Len(itm)-1)
        Cells(i, 3).Value = Trim(itm)
        i = i + 1
    Next

End Sub

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM