[英]PHP Curl UTF-8 Charset
I have an php script which calls another web page and writes all the html of the page and everything goes ok however there is a charset problem.我有一个 php 脚本,它调用另一个网页并写入页面的所有 html,一切正常,但是存在字符集问题。 My php file encoding is utf-8 and all other php files work ok (that means there is no problem with server).我的 php 文件编码是 utf-8 并且所有其他 php 文件都可以正常工作(这意味着服务器没有问题)。 What is the missing thing in that code and all spanish letters look weird.该代码中缺少什么,所有西班牙语字母看起来都很奇怪。 PS.附注。 When I wrote these weird characters original versions into php, they all look accurate.当我将这些奇怪的字符原始版本写入php时,它们看起来都很准确。
header("Content-Type: text/html; charset=utf-8");
function file_get_contents_curl($url)
{
$ch=curl_init();
curl_setopt($ch,CURLOPT_HEADER,0);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_FOLLOWLOCATION,1);
$data=curl_exec($ch);
curl_close($ch);
return $data;
}
$html=file_get_contents_curl($_GET["u"]);
$doc=new DOMDocument();
@$doc->loadHTML($html);
Simple: When you use curl it encodes the string to utf-8
you just need to decode them..简单:当您使用 curl 时,它将字符串编码为utf-8
您只需要对其进行解码即可。
Description
string utf8_decode ( string $data )
This function decodes data , assumed to be UTF-8
encoded, to ISO-8859-1
.此函数将假定为UTF-8
编码的 data 解码为ISO-8859-1
。
You Can use this header你可以使用这个标题
header('Content-type: text/html; charset=UTF-8');
and after decoding the string并在解码字符串后
$page = utf8_decode(curl_exec($ch));
It worked for me它对我有用
$output = curl_exec($ch);
$result = iconv("Windows-1251", "UTF-8", $output);
function page_title($val){
include(dirname(__FILE__).'/simple_html_dom.php');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$val);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0');
curl_setopt($ch, CURLOPT_ENCODING , "gzip");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
$return = curl_exec($ch);
$encot = false;
$charset = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
curl_close($ch);
$html = str_get_html('"'.$return.'"');
if(strpos($charset,'charset=') !== false) {
$c = str_replace("text/html; charset=","",$charset);
$encot = true;
}
else {
$lookat=$html->find('meta[http-equiv=Content-Type]',0);
$chrst = $lookat->content;
preg_match('/charset=(.+)/', $chrst, $found);
$p = trim($found[1]);
if(!empty($p) && $p != "")
{
$c = $p;
$encot = true;
}
}
$title = $html->find('title')[0]->innertext;
if($encot == true && $c != 'utf-8' && $c != 'UTF-8') $title = mb_convert_encoding($title,'UTF-8',$c);
return $title;
}
I was fetching a windows-1252 encoded file via cURL and the mb_detect_encoding(curl_exec($ch));
我正在通过 cURL 和mb_detect_encoding(curl_exec($ch));
获取一个 windows-1252 编码的文件mb_detect_encoding(curl_exec($ch));
returned UTF-8.返回 UTF-8。 Tried utf8_encode(curl_exec($ch));
试过utf8_encode(curl_exec($ch));
and the characters were correct.并且字符是正确的。
The best way I have tried before is to use urlencode()
.我之前尝试过的最好方法是使用urlencode()
。 Keep in mind, don't use it for the whole url;请记住,不要在整个 url 中使用它; instead, use it only for the needed parts.相反,仅将其用于所需的部分。 For example, a request that has two 'text-fa' and 'text-en' fields and they contain a Persian and an English text, respectively, you might only need to encode the Persian text, not the English one.例如,一个请求有两个 'text-fa' 和 'text-en' 字段,它们分别包含一个波斯语和一个英语文本,您可能只需要对波斯语文本进行编码,而不是对英语文本进行编码。
However, there are better ways if the range of characters have to be encoded is more limited.但是,如果必须编码的字符范围更有限,则有更好的方法。 One of these ways is using CURLOPT_ENCODING
, by passing it to curl_setopt()
:其中一种方法是使用CURLOPT_ENCODING
,将其传递给curl_setopt()
:
curl_setopt($ch, CURLOPT_ENCODING, "");
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.