简体   繁体   中英

Get the “redirected to” page

I am trying to parse a home page of a site, but it is accessible through redirecting from another page only, so I can only have the html of the redirecting page.

How can I get the html page of the "redirected to" page ?

the following is an example: I can get a page a.html, which when I open with browser it will redirect me to b.html, I want to parse b.html, but when I open b.html directly it will require POST parameters that can be sent from a.html to b.html when redirecting.

Edit: just for note, the "redirected to" page is has a relative path, so I do the following:

$pos=strpos($result,"window.location = \"");
$res= substr_replace ($result,"https://thecompletepath/",$pos,0);
echo $res;

and the redirecting is through a javascript code, as following:

<script type="text/javascript" charset="utf-8">
    escapeIfModal();
    LoadingScreen.start();
    window.location = "/home";
</script>

You can use cURL to follow redirects as the browser would.

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "a.html");
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$a = curl_exec($ch); //response $a would contain the last redirected location: "b.html"

using file_get_contents:

$context = stream_context_create(
    array(
        'http' => array(
            'follow_location' => true
        )
    )
);

$html = file_get_contents('http://www.example.com/a.html', false, $context);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM