為什么file_get_contents返回亂碼數據？

Question

我正在嘗試使用一些簡單的php從下面的頁面中獲取HTML。

網址： https : //kat.cr/usearch/architecture%20category%3Abooks/

我的代碼是：

$html = file_get_contents('https://kat.cr/usearch/architecture%20category%3Abooks/');
echo $html;

在哪里file_get_contents工作，但返回加擾的數據：

我嘗試使用cUrl以及各種函數，例如： htmlentities(), mb_convert_encoding ， utf8_encode等，但只是獲得了加擾文本的不同變體。

頁面的來源說這是charset=utf-8 ，但是我不確定是什么問題。

在基本URL kat.cr上調用file_get_contents() kat.cr返回相同的混亂。

我在這里想念什么？

Answer 1

它是GZ壓縮的，當被瀏覽器獲取時，瀏覽器將其解壓縮，因此您需要解壓縮。 要輸出它，也可以使用readgzfile（）：

readgzfile('https://kat.cr/usearch/architecture%20category%3Abooks/');

Answer 2

您的站點響應正在被壓縮，因此必須解壓縮才能將其轉換為原始形式。

最快的方法是使用gzinflate() ，如下所示：

$html = gzinflate(substr(file_get_contents("https://kat.cr/usearch/architecture%20category%3Abooks/"), 10, -8));

或者，對於更高級的解決方案，請考慮以下功能（在此博客中找到）：

function get_url($url)
{
    //user agent is very necessary, otherwise some websites like google.com wont give zipped content
    $opts = array(
        'http'=>array(
            'method'=>"GET",
            'header'=>"Accept-Language: en-US,en;q=0.8rn" .
                        "Accept-Encoding: gzip,deflate,sdchrn" .
                        "Accept-Charset:UTF-8,*;q=0.5rn" .
                        "User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:19.0) Gecko/20100101 Firefox/19.0 FirePHP/0.4rn"
        )
    );

    $context = stream_context_create($opts);
    $content = file_get_contents($url ,false,$context); 

    //If http response header mentions that content is gzipped, then uncompress it
    foreach($http_response_header as $c => $h)
    {
        if(stristr($h, 'content-encoding') and stristr($h, 'gzip'))
        {
            //Now lets uncompress the compressed data
            $content = gzinflate( substr($content,10,-8) );
        }
    }

    return $content;
}

echo get_url('http://www.google.com/');

為什么file_get_contents返回亂碼數據？

問題描述

2 個解決方案

解決方案1
2 2015-08-10 21:22:10

解決方案2
2 已采納 2015-08-10 21:26:57

為什么file_get_contents返回亂碼數據？

問題描述

2 個解決方案

解決方案1 2 2015-08-10 21:22:10

解決方案2 2 已采納 2015-08-10 21:26:57

解決方案1
2 2015-08-10 21:22:10

解決方案2
2 已采納 2015-08-10 21:26:57