简体   繁体   中英

How to capture HTML of redirect page before it redirects?

I am trying to read the HTML of a page that contains a non-delayed redirect. The following snippet (C#) will give me the destination/redirected page, not the initial one I need to see:

using System.Net;
using System.Text;

public class SomeClass {
    public static void Main() {
        byte[] data = new WebClient().DownloadData("http://SomeUrl.com");
        System.Console.WriteLine(Encoding.ASCII.GetString(data));
    }
}

Is there a way to get the HTML of a redirecting page? (I prefer .NET but a snippet in Java or Python would be fine too. Thx!)

Unless the redirect is done on the client side you can't. If the redirect is done server side, then no html is actually generated to the client, but the header is redirected at the new server.

It would take more work, but rather than using WebClient , use HttpWebRequest and set the AllowAutoRedirect property to False . A redirect will then throw an exception, but you can get any response text (and some pages do have response text along with the redirect) from the exception's response object. After you get the response from the exception, you can issue another HttpWebRequest for the redirect URL (specified in the Location response header).

You might be able to do something similar with WebRequest if you create a derived object, MyWebRequest , where you overload the GetWebRequest method and set the AllowAutoRedirect property. I don't know what kind of exception, if any, the DownloadData method will return if you do something like that.

As somebody said previously, this will only work for those pages that do client-side redirects (typically 301 or 302). If there is server-side redirection going on, you'd never know it.

如果要获取html页面的源代码,可以使用此工具: http : //www.selfseo.com/html_source_view.php

Simplest answer would be to add the current page onto the QueryString component of the redirect when redirecting, for instance:

Response.Redirect(newPage + "?FromPage=" + Request.Url);

Then the new page could see where you cane from by simply looking at Request.QueryString("FromPage") .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM