简体   繁体   中英

Crawl to final URL from within Excel VBA

I have a list of domain names, and many of them redirect me to the same domain. For instance... foo1.com, foo2.csm and foo3.com all deposit me at foo.com.

I'm trying to deduplicate the list of domains by writing a VBA script to load the final page and extract it's URL.

I've started from this article which retrieve's the page's title ( http://www.excelforum.com/excel-programming-vba-macros/355192-can-i-import-raw-html-source-code-into-excel.html ), but can't figure out how to modify it to get the final URL (from which I can extract the URL.

Can anyone please point me in the right direction?

Try this, need to look at .LocationURL:

Public Function gsGetFinalURL(rsURL As String) As String
Dim ie As Object

Set ie = CreateObject("InternetExplorer.Application")

With ie
    .navigate rsURL
    Do While .Busy And Not .ReadyState = 4
        DoEvents
    Loop
    gsGetFinalURL = .LocationURL
    .Quit
End With

Set ie = Nothing
End Function

I haven't tried it on a huge variety of URLs, just the one you gave and a couple of others. If it is an invalid URL it will return what is passed. You can use the code from the original function to check and handle accordingly.

Add a reference to "Microsoft XML, v3.0" (or whatever version you have)

Sub tester()
    Debug.Print CheckRedirect("adhpn2.com")
End Sub

Function CheckRedirect(URL As String)
    If Not UCase(URL) Like "HTTP://*" Then URL = "http://" & URL
    With New msxml2.ServerXMLHTTP40
        .Open "HEAD", URL, False
        .send
        CheckRedirect = .getOption(-1)
    End With
End Function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM