简体   繁体   English

PHP Curl UTF-8 字符集

[英]PHP Curl UTF-8 Charset

I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem.我有一个 php 脚本,它调用另一个网页并写入页面的所有 html,一切正常,但是存在字符集问题。 My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server).我的 php 文件编码是 utf-8 并且所有其他 php 文件都可以正常工作(这意味着服务器没有问题)。 What is the missing thing in that code and all spanish letters look weird.该代码中缺少什么,所有西班牙语字母看起来都很奇怪。 PS.附注。 When I wrote these weird characters original versions into php, they all look accurate.当我将这些奇怪的字符原始版本写入php时,它们看起来都很准确。

header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
    $ch=curl_init();
    curl_setopt($ch,CURLOPT_HEADER,0);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
    $data=curl_exec($ch);
    curl_close($ch);
    return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);

Simple: When you use curl it encodes the string to utf-8 you just need to decode them..简单:当您使用 curl 时,它将字符串编码为utf-8您只需要对其进行解码即可。

Description

string utf8_decode ( string $data )

This function decodes data , assumed to be UTF-8 encoded, to ISO-8859-1 .此函数将假定为UTF-8编码的 data 解码为ISO-8859-1

You Can use this header你可以使用这个标题

   header('Content-type: text/html; charset=UTF-8');

and after decoding the string并在解码字符串后

 $page = utf8_decode(curl_exec($ch));

It worked for me它对我有用

$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);
function page_title($val){
    include(dirname(__FILE__).'/simple_html_dom.php');
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL,$val);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
    curl_setopt($ch, CURLOPT_ENCODING , "gzip");
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 0);
    $return = curl_exec($ch); 
    $encot = false;
    $charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);

    curl_close($ch); 
    $html = str_get_html('"'.$return.'"');

    if(strpos($charset,'charset=') !== false) {
        $c = str_replace("text/html; charset=","",$charset);
        $encot = true;
    }
    else {
        $lookat=$html->find('meta[http-equiv=Content-Type]',0);
        $chrst = $lookat->content;
        preg_match('/charset=(.+)/', $chrst, $found);
        $p = trim($found[1]);
        if(!empty($p) && $p != "")
        {
            $c = $p;
            $encot = true;
        }
    }
    $title = $html->find('title')[0]->innertext;
    if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);

    return $title;
}

I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch));我正在通过 cURL 和mb_detect_encoding(curl_exec($ch));获取一个 windows-1252 编码的文件mb_detect_encoding(curl_exec($ch)); returned UTF-8.返回 UTF-8。 Tried utf8_encode(curl_exec($ch));试过utf8_encode(curl_exec($ch)); and the characters were correct.并且字符是正确的。

First method (internal function)第一种方法(内部函数)

The best way I have tried before is to use urlencode() .我之前尝试过的最好方法是使用urlencode() Keep in mind, don't use it for the whole url;请记住,不要在整个 url 中使用它; instead, use it only for the needed parts.相反,仅将其用于所需的部分。 For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.例如,一个请求有两个 'text-fa' 和 'text-en' 字段,它们分别包含一个波斯语和一个英语文本,您可能只需要对波斯语文本进行编码,而不是对英语文本进行编码。

Second Method (using cURL function)第二种方法(使用 cURL 函数)

However, there are better ways if the range of characters have to be encoded is more limited.但是,如果必须编码的字符范围更有限,则有更好的方法。 One of these ways is using CURLOPT_ENCODING , by passing it to curl_setopt() :其中一种方法是使用CURLOPT_ENCODING ,将其传递给curl_setopt()

curl_setopt($ch, CURLOPT_ENCODING, "");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM