简体   繁体   中英

How to get the hyperlink in a webpage written in JavaScript using a WebBrowser

webBrowser1.Navigate(myurl); 
HtmlElementCollection links = webBrowser1.Document.GetElementsByTagName("HTML");
foreach (HtmlElement link in links)
{
MessageBox.Show(link.InnerHtml);         
}

I used the code above and can get the whole HTML document of the webpage, but what I really want is to get the hyperlink url(the url as ("a href")) in the HTML document.

I use:

MessageBox.Show(link.GetAttribute("href"));

but it returns null.

Could someone help me out of the problem not using regular expressions but using something like simple function?

The HTML code with JavaScript is shown as follows:

<HTML>
<HEAD>
<TITLE>Test for me</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8" />
<SCRIPT LANGUAGE="JavaScript">
<!--


function window::onload()
{

 result.innerHTML = 
      "<br><center><font size=+1><a href='MyMain.aspx' target='_parent'>Back to My Main        Page</a></font><br>"
    + "<font size=+2><b><a href='http://MySub.asp'>Launch My Application</a></b></font></center>";     
}

-->
</SCRIPT>
</HEAD>

<BODY>
<div id="result" />
</BODY>
</HTML>

This is almost a complete guess, but:

  • You're likely going to have to wait until the document actually loads before you grab your content.
  • You need to grab the right content ("A" tags)

Going by http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.documentcompleted.aspx , I'd guess the following code might work:

First, set up an event handler:

// Add an event handler that processes the document after it loads.
webBrowser1.DocumentCompleted +=
    new WebBrowserDocumentCompletedEventHandler(ProcessDocument);

elsewhere, define the handler (and what needs to happen)

private void ProcessDocument(object sender,
    WebBrowserDocumentCompletedEventArgs e)
{

    var webBrowser1 = (WebBrowser)sender;

    HtmlElementCollection links = webBrowser1.Document.GetElementsByTagName("A");
    foreach (HtmlElement link in links)
    {
        MessageBox.Show(link.GetAttribute("href"));         
    }

}

and finally:

webBrowser1.Navigate(myurl); 

The problem is, the msdn documentation doesn't say much as to what has to happen before a document is "completely loaded".

Edit: I finally tried it in LinqPad, and It doesn't look like it exposes anything relating to the "window load" event, at least directly. I would wager the DocumentCompleted event is fired more like a "DOMReady" event. The following is a bit of a hack, but it appears on the third invocation of the DocumentTitleChange event, it grabs the contents of the href. PLEASE NOTE the reason why it's invoked the third time is that I have javascript that changes the title !

void Main()
{
    WebBrowser webBrowser1 = new WebBrowser();
    webBrowser1.DocumentTitleChanged +=
        new EventHandler(ProcessDocument);
    webBrowser1.Navigate("http://localhost/test/test.html"); 

    Console.ReadLine();
}

// Define other methods and classes here

private void ProcessDocument(object sender,
    EventArgs e)
{

    var webBrowser1 = (WebBrowser)sender;
    Console.WriteLine("ProcessDocument BEGIN");
    HtmlElementCollection links = webBrowser1.Document.GetElementsByTagName("A");
    foreach (HtmlElement link in links)
    {
        Console.WriteLine(link.GetAttribute("href"));         
    }
    Console.WriteLine("ProcessDocument END");
    Console.Out.Flush();

}

Where your html is:

<HTML>
<HEAD>
<TITLE>Test for me</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8" />
<SCRIPT LANGUAGE="JavaScript">
<!--


function foo()
{
  var result = document.getElementById('result');
 result.innerHTML = 
      "<br><center><font size=+1><a href='MyMain.aspx' target='_parent'>Back to My Main        Page</a></font><br>"
    + "<font size=+2><b><a href='http://MySub.asp'>Launch My Application</a></b></font></center>";
  document.title += "Hack..Aacklgahala, ribbit";    
}

-->
</SCRIPT>
</HEAD>

<BODY onload="foo()">
<a href="http://google.com">bar</a>
<div id="result" />
</BODY>
</HTML>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM