简体   繁体   中英

Including an external webpage using PHP

How can I use PHP to include an external webpage? (sort of like the wordpress theme preview.)

I want (X)HTML STRICT compliant code - no iFrame and preferably no javascript.

The idea is that I am making a sandbox for clients to view webpages in my controlled environment. The other thing is that the webpages being included should not be visible without the "sandbox" wrapper".

EDIT:

According to some commentators, GoDaddy has cUrl. The next part of the question becomes - how do I strip out the headers and footers of the html in php so that just the contents of the body tag remain? I would rather use php string functions than regex.

Try Using Curl:

/**
 * Get a web file (HTML, XHTML, XML, image, etc.) from a URL.  Return an
 * array containing the HTTP server response header fields and content.
 */
function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle all encodings
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}

Just call that function as-is with your url and it should echo out the whole webpage into the php page.

However, you may need to rewrite links to assets, such as stylesheets and images using some regex. (Replace "/image.jpg" with " http://mydomain.com/image.jpg ").

Curl usually is installed on shared hosts.

If you want to just get the page's body, or head, you can use simplexml or regex expressions for that. (If the html is well-formed, simplexml is great for traversing the DOM).

PHP's file_get_contents command works across domains, so you're able to retrieve external mark-up. However, just outputting this has multiple issues, including relative links not working, as well as cross-site scripting vulnerabilities.

While you said you don't want to use an iframe , the tag is valid XHTML 1.0 Transitional, and just based on your description is what I would recommend for compatibility and security reasons.

What you can do is use this:

function __test($results){
    $pattern = '/http:\/\/.+\.(jpeg|jpg|gif)/'; //regex pattern defines the image :D
    preg_match_all($pattern, $results, $array); //responce of array

    foreach ($array[0] as $images)  //add everything as one 
    {
        $results_image = $images;
        $url = "http://saxtorinc.com/$results_image";
    }
    return $url;                                  
}

Note that you would have to define the domain name

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM