简体   繁体   中英

using MS Excel VBA to extracting data from complex HTML/JS

Short introduction, i consider myself as a intermediate VBA coder without any significant HTML experience. I would like to extract data from a HTML/JS webpage using MS Excel VBA. I have spent couple of hours testing my code on various pages as well as looking for training materials and various forums and Q&A pages.

I am desperately asking for you help. (Office 2013, IE 11.0.96)

The goal is to get the FX rate of a certain bloomberg webpage. The long term goal is to run a macro on various exchange rates and get the daily rate out of the system to an excel table per working day, but i will be handle that part.

I would be happy either with

(1)the current rate (span class="priceText__1853e8a5") or

(2) previous closing (section class="dataBox opreviousclosingpriceonetradingdayago numeric") or

(3) opening rate (section class="dataBox openprice numeric").

My issue is that I cannot fetch the part of the html code where the rate is.

Dim IE As Object
Dim div As Object, holdingsClass As Object, botoes As Object
Dim html As HTMLDocument
Set IE = CreateObject("InternetExplorer.Application")
    With IE
        .Visible = False
        .Navigate "https://www.bloomberg.com/quote/EURHKD:CUR"
        Do Until .ReadyState = 4: DoEvents: Loop
    End With

Set html = IE.document
Set div = IE.document.getElementById("leaderboard") 'works just fine, populates the objects
Set holdingsClass = IE.document.getElementsByclass("dataBox opreviousclosingpriceonetradingdayago numeric") 'i am not sure is it a class element at all
Set botoes = IE.document.getElementsByTagName("dataBox openprice numeric") 'i am not sure is it a tag name at all

    Range("a1").Value = div.textContent 'example how i would place it by using .textContent 
    Range("A2").Value = holdingsClass.textContent
    Range("A3").Value = botoes.textContent

Much appreciate your help!

Instead of digging through html why not use Bloomberg API to request the specific rate?

Likely would be faster and would save you a lot of time in the future doing the same kind of thing.

Please see my similiar project where I create a macro to pull historical FX rates from the European central bank. https://github.com/dmegaffi/VBA-GET-Requests/blob/master/FX%20-%20GET.xlsm

If you right-click the webpage element you want in chrome and select inspect, it'll bring up the details of that element. You can also press f12 to bring up the HTML of any page. This also works in other browsers.

Is this the element you're looking for? screen shot of mentioned webpage

Based on your code above, you could reference this element with IE.document.getElementsByclass("priceText__1853e8a5"). Elements in HTML can share classes but can't share ID's, so if there is another element with the class priceText__1853e8a5 it won't work since it won't select a single element. Then, of course, you have to select the text within the element since at this point you'd just have the and would need the text inside of it. Hope this helps.

To address your questions generally, see below.

(1)the current rate (span class="priceText__1853e8a5")

That can be written as a CSS query selector of:

span.priceText__1853e8a5

(2) previous closing (section class="dataBox opreviousclosingpriceonetradingdayago numeric")

That can be written as a CSS query selector of:

.dataBox.opreviousclosingpriceonetradingdayago.numeric

(3) opening rate (section class="dataBox openprice numeric")

That can be written as a CSS query selector of:

.dataBox.openprice.numeric

They are applied with querySelector or querySelectorAll (if more than one match and a later match than the first is required) of HTMLDocument .

Eg

Debug.Print IE.document.querySelector("span.priceText__1853e8a5").innerText

If more using querySelectorAll

IE.document.querySelectorAll("span.priceText__1853e8a5")(0).innerText

In the above you replace 0 with the appropriate index where your target element is found.


Observing the page the actual selectors appear to be as follows but I think this website is probably using ecmascript syntax that is not supported on legacy browsers ie Internet Explorer or is attempting blocked cross domain requests.

Option Explicit

Public Sub GetInfo()
    Dim IE As New InternetExplorer
    With IE
        .Visible = True
        .navigate "https://www.bloomberg.com/quote/EURHKD:CUR"

        While .Busy Or .readyState < 4: DoEvents: Wend
        With .document
            Debug.Print "Current: " & .querySelector(".priceText__1853e8a5").innerText
            Debug.Print "Prev close: " & .querySelector(".value__b93f12ea").innerText
            Debug.Print "Open: " & .querySelector(".value__b93f12ea").innerText
        End With

        .Quit
    End With
End Sub

Using Selenium Basic and Chrome the page renders fine:

Option Explicit
Public Sub GetInfo()
    Dim d As WebDriver
    Set d = New ChromeDriver
    Const URL = "https://www.bloomberg.com/quote/EURHKD:CUR"
    With d
        .Start "Chrome"
        .get URL
        Debug.Print "Current: " & .FindElementByCss(".priceText__1853e8a5").Text
        Debug.Print "Prev close: " & .FindElementByCss(".value__b93f12ea").Text
        Debug.Print "Open: " & .FindElementByCss(".value__b93f12ea").Text
        .Quit
    End With
End Sub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM