PHP Curl UTF-8 Charset

Question

I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem. My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server). What is the missing thing in that code and all spanish letters look weird. PS. When I wrote these weird characters original versions into php, they all look accurate.

header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
    $ch=curl_init();
    curl_setopt($ch,CURLOPT_HEADER,0);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
    $data=curl_exec($ch);
    curl_close($ch);
    return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);

Answer 1

Simple: When you use curl it encodes the string to utf-8 you just need to decode them..

Description

string utf8_decode ( string $data )

This function decodes data , assumed to be UTF-8 encoded, to ISO-8859-1 .

Answer 2

You Can use this header

   header('Content-type: text/html; charset=UTF-8');

and after decoding the string

 $page = utf8_decode(curl_exec($ch));

It worked for me

Answer 3

$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);

Answer 4

function page_title($val){
    include(dirname(__FILE__).'/simple_html_dom.php');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$val);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
    curl_setopt($ch, CURLOPT_ENCODING , "gzip");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    $return = curl_exec($ch); 
    $encot = false;
    $charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    curl_close($ch); 
    $html = str_get_html('"'.$return.'"');

    if(strpos($charset,'charset=') !== false) {
        $c = str_replace("text/html; charset=","",$charset);
        $encot = true;
    }
    else {
        $lookat=$html->find('meta[http-equiv=Content-Type]',0);
        $chrst = $lookat->content;
        preg_match('/charset=(.+)/', $chrst, $found);
        $p = trim($found[1]);
        if(!empty($p) && $p != "")
        {
            $c = $p;
            $encot = true;
        }
    }
    $title = $html->find('title')[0]->innertext;
    if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);

    return $title;
}

Answer 5

I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch)); returned UTF-8. Tried utf8_encode(curl_exec($ch)); and the characters were correct.

Answer 6

First method (internal function)

The best way I have tried before is to use urlencode() . Keep in mind, don't use it for the whole url; instead, use it only for the needed parts. For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.

Second Method (using cURL function)

However, there are better ways if the range of characters have to be encoded is more limited. One of these ways is using CURLOPT_ENCODING , by passing it to curl_setopt() :

curl_setopt($ch, CURLOPT_ENCODING, "");

PHP Curl UTF-8 Charset

Question

6 answers

solution1
38 ACCPTED 2012-11-22 15:44:25

solution2
16 2014-09-04 06:48:45

solution3
4 2017-07-30 12:41:50

solution4
3 2013-11-21 11:56:06

solution5
3 2016-05-20 16:26:07

solution6
2 2017-06-30 21:24:23

First method (internal function)

Second Method (using cURL function)

PHP Curl UTF-8 Charset

Question

6 answers

solution1 38 ACCPTED 2012-11-22 15:44:25

solution2 16 2014-09-04 06:48:45

solution3 4 2017-07-30 12:41:50

solution4 3 2013-11-21 11:56:06

solution5 3 2016-05-20 16:26:07

solution6 2 2017-06-30 21:24:23

First method (internal function)

Second Method (using cURL function)

solution1
38 ACCPTED 2012-11-22 15:44:25

solution2
16 2014-09-04 06:48:45

solution3
4 2017-07-30 12:41:50

solution4
3 2013-11-21 11:56:06

solution5
3 2016-05-20 16:26:07

solution6
2 2017-06-30 21:24:23