简体   繁体   中英

How to “manually” go back with a WebBrowser?

I'm working on a web scraper that sometimes needs to remember a particular page, then go to some other pages and then go back to that page. Currently I just save the URL of the page, but that doesn't work for pages like Google Maps, where the URL is always the same.

I can see that the GoBack method does go back to the previous page, so somehow the WebBrowser remembers what the previous pages was. How can I do this manually? I could count how many pages have been visited since the page I want to go back to and then call GoBack as many times as necessary, but that's pretty unreliable and un-elegant. So I wonder how could I implement a GoBackToAParticularPage method.

There is one thing I think would get me closer to a solution: saving the URL of all frames and then putting them back when going back to that page. I think that would solve at lease the Google Maps problems. I have not tested it yet. I don't know exactly what would it be the proper way to do this. I would need to wait for the frames to exist before setting their URLs.

You can use

webBrowser1.Document.Window.History.Go(x);

where x is an int signifying the relative position in the browser's history.

x=-2 would navigate two pages back.

Update : More info on HtmlHistory.Go()

try this!

javascript:history.go(-1)"

I know a few things have been said, so i won't re-write that, however, if you really want to use a JavaScript method (ie: if you want to use the javascript history object instead of the webbrowser controls history object) and are wondering how, there are ways to do this. You can use .InvokeScript in .NET WB controls, or if you want pre-.NET & .NET compatible, you can use this:

You can use .execScript in pre-.NET versions of WB control and current/.NET versions of WB control. You can also choose the language of the script you want to execute, ie: "JScript" or "VBScript". Here is the one liner:

WebBrowser1.Document.parentWindow.execScript "alert('hello world');", "JScript" 

The good thing about using the JavaScript history object is that if you kill history information in the webbrowser control by sending the number "2" into the .navigate method, going to the page where history was cancelled in WB control will not work, but it will work in the JavaScript's history object, this is an advantage.

Once again, this is just a backwards compatible supplement to the ideas discussed on this post already, including a few other tidbits not mentioned.

Let me know if i can be of further help to you since and answer was already accepted.

By javascript Location object you may achieve you task.

<FORM><INPUT TYPE="BUTTON" VALUE="Go Back" 
ONCLICK="history.go(-1)"></FORM>

also check

JavaScript History Object

for the history information

Browser history, by design, is opaque; otherwise it opens a security hole: Do you really want every page you visit to have visibility as to what pages/sites you've been visiting? Probably not.

To do what you want, you'll need to implement your own stack of URIs, tracking what needs to be revisited.

You don't want to use history.go(-1) because it is unreliable. But, you can't use the URL, because there are pages like GoogleMaps where the URL is always the same.

If the URL is the same but the content is different, then it means that values to determine the page's content are being pulled from somewhere other than the URL.

Where could this be?

Your most likely suspect is the posted form-collection, but data could also be coming from the cookie.

I think it makes a lot more sense to index the absolute location than a relative location, because as you noted, relative locations can be unreliable. The problem is that you need to get all the data that is being sent to the web server, to understand what its actual absolute location is (because the URI is not sufficient).

The way to do this is to create a local copy of the page, and replace the submission url (this could be in a link, a form or in the javascript), with a URL on your server. Then when you click something on the GoogleMaps page to trigger a change (that seems not to affect the URL), you will receive that data on your server, and will be able to determine the actual location.

Think about it like a querystring.

If I have

<form action="http://myhost.com/page.html" method="get">
   <input type="hidden" name="secret_location_parameter" value="mrbigglesworth" />
   <input type="submit" />
</form>

and I click the submit button, I get taken to the url

 http://myhost.com/page.html?secret_location_parameter=mrbigglesworth

However, If I have

<form action="http://myhost.com/page.html" method="post">
   <input type="hidden" name="secret_location_parameter" value="mrbigglesworth" />
   <input type="submit" />
</form>

and I click the submit button, then I get taken to the url

 http://myhost.com/page.html

The server still receives secret_location_parameter=mrbigglesworth , but it gets it as a form value instead of a querystring value, so it isn't visible from the url. The server might render a different page depending on the secret_location_parameter value, but not change the url, and if a post method is used, then it will appear that multiple pages reside at the same url.

My point is that you may be addressing the problem from the wrong angle, because you didn't understand what was going on under the hood. I am certainly making assumptions, but based on the way you asked your question I think this may be helpful for you

如果您不需要直观地看到发生的事情,可能会有更优雅的方法来使用WebClient类导航和解析URL,或许详细说明您的特定程序会产生更清晰的结果。

Assuming that you have a webbrowser control on a form and you are trying to implement go back.

Following is the solution. (If the assumption is wrong. Please correct me)

Add a webbrowser, textbox, button as btnBack

History variable also has the url data for navigation(but not used currently).

C# solution

using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Linq;
using System.Text;
using System.Windows.Forms;

namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
    public Form1()
    {
        InitializeComponent();
    }
    private void Form1_Load(object sender, EventArgs e)
    {
         WebBrowser1.Url = new Uri("http://maps.google.com");
    }
    Stack< String> History = new Stack<String>();

    private void WebBrowser1_Navigating(object sender, WebBrowserNavigatingEventArgs e)
    {
            TextBox1.Text = e.Url.ToString();
            History.Push(e.Url.ToString());
    }

    private void btnBack_Click(object sender, EventArgs e)
    {
        if(WebBrowser1.CanGoBack) 
        {
            WebBrowser1.GoBack();
        }

    }

}
}

Vb solution

Public Class Form1
Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
    WebBrowser1.Url = New Uri("http://maps.google.com")
End Sub

Private Sub WebBrowser1_Navigating(ByVal sender As Object, ByVal e As System.Windows.Forms.WebBrowserNavigatingEventArgs) Handles WebBrowser1.Navigating
    TextBox1.Text = e.Url.ToString
    History.Push(e.Url.ToString)
End Sub
Dim History As New Stack(Of String)
Private Sub btnBack_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnBack.Click
    If WebBrowser1.CanGoBack Then
        WebBrowser1.GoBack()
    End If
End Sub

End Class

Programmatically add a marker element to the DOM for those pages you will later want to go back to. When backtracking through the browser history, check for that marker after each history.go(-1) and stop when you encounter it. This might prove unreliable in some cases, in which case remembering the depth level may serve as a backup approach.

You may need to experiment with the right time to insert the element, to make sure it is properly recorded in the history.

In case anyone else can benefit from it, here is how I ended up doing it. The only caveat is that if the travel log to has too many pages in between, the entry might not exist any more. There is probably a way to increase the history size, but since there have to be some limit, I use the TravelLog.GetTravelLogEntries method to see whether the entry still exists or not and if not, use the URL instead.

Most of this code came from PInvoke .

using System;
using System.Runtime.InteropServices;
using System.Windows.Forms;
using System.Collections.Generic;

namespace TravelLogUtils
{
    [ComVisible(true), ComImport()]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    [GuidAttribute("7EBFDD87-AD18-11d3-A4C5-00C04F72D6B8")]
    public interface ITravelLogEntry
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int GetTitle([Out] out IntPtr ppszTitle); //LPOLESTR LPWSTR

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int GetURL([Out] out IntPtr ppszURL); //LPOLESTR LPWSTR
    }

    [ComVisible(true), ComImport()]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    [GuidAttribute("7EBFDD85-AD18-11d3-A4C5-00C04F72D6B8")]
    public interface IEnumTravelLogEntry
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int Next(
            [In, MarshalAs(UnmanagedType.U4)] int celt,
            [Out] out ITravelLogEntry rgelt,
            [Out, MarshalAs(UnmanagedType.U4)] out int pceltFetched);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int Skip([In, MarshalAs(UnmanagedType.U4)] int celt);

        void Reset();

        void Clone([Out] out ITravelLogEntry ppenum);
    }

    public enum TLMENUF
    {
        /// <summary>
        /// Enumeration should include the current travel log entry.
        /// </summary>
        TLEF_RELATIVE_INCLUDE_CURRENT = 0x00000001,
        /// <summary>
        /// Enumeration should include entries before the current entry.
        /// </summary>
        TLEF_RELATIVE_BACK = 0x00000010,
        /// <summary>
        /// Enumeration should include entries after the current entry.
        /// </summary>
        TLEF_RELATIVE_FORE = 0x00000020,
        /// <summary>
        /// Enumeration should include entries which cannot be navigated to.
        /// </summary>
        TLEF_INCLUDE_UNINVOKEABLE = 0x00000040,
        /// <summary>
        /// Enumeration should include all invokable entries.
        /// </summary>
        TLEF_ABSOLUTE = 0x00000031
    }

    [ComVisible(true), ComImport()]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    [GuidAttribute("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8")]
    public interface ITravelLogStg
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int CreateEntry([In, MarshalAs(UnmanagedType.LPWStr)] string pszUrl,
            [In, MarshalAs(UnmanagedType.LPWStr)] string pszTitle,
            [In] ITravelLogEntry ptleRelativeTo,
            [In, MarshalAs(UnmanagedType.Bool)] bool fPrepend,
            [Out] out ITravelLogEntry pptle);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int TravelTo([In] ITravelLogEntry ptle);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int EnumEntries([In] int TLENUMF_flags, [Out] out IEnumTravelLogEntry ppenum);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int FindEntries([In] int TLENUMF_flags,
        [In, MarshalAs(UnmanagedType.LPWStr)] string pszUrl,
        [Out] out IEnumTravelLogEntry ppenum);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int GetCount([In] int TLENUMF_flags, [Out] out int pcEntries);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int RemoveEntry([In] ITravelLogEntry ptle);

        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int GetRelativeEntry([In] int iOffset, [Out] out ITravelLogEntry ptle);
    }

    [ComImport, ComVisible(true)]
    [Guid("6d5140c1-7436-11ce-8034-00aa006009fa")]
    [InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
    public interface IServiceProvider
    {
        [return: MarshalAs(UnmanagedType.I4)]
        [PreserveSig]
        int QueryService(
            [In] ref Guid guidService,
            [In] ref Guid riid,
            [Out] out IntPtr ppvObject);
    }

    public class TravelLog
    {
        public static Guid IID_ITravelLogStg = new Guid("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8");
        public static Guid SID_STravelLogCursor = new Guid("7EBFDD80-AD18-11d3-A4C5-00C04F72D6B8");

        //public static void TravelTo(WebBrowser webBrowser, int 
        public static ITravelLogEntry GetTravelLogEntry(WebBrowser webBrowser)
        {
            int HRESULT_OK = 0;

            SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
            IServiceProvider psp = axWebBrowser as IServiceProvider;
            if (psp == null) throw new Exception("Could not get IServiceProvider.");

            IntPtr oret = IntPtr.Zero;            
            int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);            
            if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");

            ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
            if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");            
            ITravelLogEntry ptle = null;

            hr = tlstg.GetRelativeEntry(0, out ptle);

            if (hr != HRESULT_OK) throw new Exception("Failed to get travel log entry with error " + hr.ToString("X"));

            Marshal.ReleaseComObject(tlstg);
            return ptle;
        }

        public static void TravelToTravelLogEntry(WebBrowser webBrowser, ITravelLogEntry travelLogEntry)
        {
            int HRESULT_OK = 0;

            SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
            IServiceProvider psp = axWebBrowser as IServiceProvider;
            if (psp == null) throw new Exception("Could not get IServiceProvider.");

            IntPtr oret = IntPtr.Zero;
            int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);
            if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");

            ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
            if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");

            hr = tlstg.TravelTo(travelLogEntry);

            if (hr != HRESULT_OK) throw new Exception("Failed to travel to log entry with error " + hr.ToString("X"));

            Marshal.ReleaseComObject(tlstg);
        }

        public static HashSet<ITravelLogEntry> GetTravelLogEntries(WebBrowser webBrowser)
        {
            int HRESULT_OK = 0;

            SHDocVw.IWebBrowser2 axWebBrowser = (SHDocVw.IWebBrowser2)webBrowser.ActiveXInstance;
            IServiceProvider psp = axWebBrowser as IServiceProvider;
            if (psp == null) throw new Exception("Could not get IServiceProvider.");

            IntPtr oret = IntPtr.Zero;
            int hr = psp.QueryService(ref SID_STravelLogCursor, ref IID_ITravelLogStg, out oret);
            if ((oret == IntPtr.Zero) || (hr != HRESULT_OK)) throw new Exception("Failed to query service.");

            ITravelLogStg tlstg = Marshal.GetObjectForIUnknown(oret) as ITravelLogStg;
            if (null == tlstg) throw new Exception("Failed to get ITravelLogStg");

            //Enum the travel log entries
            IEnumTravelLogEntry penumtle = null;
            tlstg.EnumEntries((int)TLMENUF.TLEF_ABSOLUTE, out penumtle);
            hr = 0;
            ITravelLogEntry ptle = null;
            int fetched = 0;
            const int MAX_FETCH_COUNT = 1;

            hr = penumtle.Next(MAX_FETCH_COUNT, out ptle, out fetched);
            Marshal.ThrowExceptionForHR(hr);

            HashSet<ITravelLogEntry> results = new HashSet<ITravelLogEntry>();

            for (int i = 0; 0 == hr; i++)
            {
                if (ptle != null) results.Add(ptle);
                hr = penumtle.Next(MAX_FETCH_COUNT, out ptle, out fetched);
                Marshal.ThrowExceptionForHR(hr);
            }

            Marshal.ReleaseComObject(penumtle);
            Marshal.ReleaseComObject(tlstg);

            return results;
        }
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM