简体   繁体   English

cURL字体编码错误

[英]cURL font encoding-error

I want to get contents via cURL from this page . 我想通过cURL从此页面获取内容。

Here is my code: 这是我的代码:

$url = $_GET["url"];
$url = str_replace(" ", "%20", $url);
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, $url);
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$jsonData = curl_exec($curlSession);
curl_close($curlSession);
if (strpos($url, "toomva.com") >= 0) {
     $jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
if (strpos($url, "Toomva -") >= 0){
    $jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
echo($jsonData);

Here you can find a live demo. 在这里您可以找到现场演示。

My problem is that the returned text is not as I expect. 我的问题是返回的文本不符合我的期望。 It has a lot of : 它有很多...

1 0 0 : 0 0 : 2 4 , 4 0 0 - - > 0 0 : 0 0 : 3 3 , 1 4 0 M i k h i a n h t r n g t h y k h u n m t e m , t h g i a n n y n h c h t t a n b i n 1 0 0 : 0 0 : 2 4 , 4 0 0 - 0 0 : 0 0 :. 3 3 1 4 0 M ik h i a nr n g t h y.k.h.u.n.m.t.e.m.,t.h.g.i.a.n.n.y.n h.c.h.t.t.a.n.b.i.n.

Can you please help me with this? 你能帮我吗?

Here are the first few bytes of the file you're trying to access: 这是您尝试访问的文件的前几个字节:

$ curl -s 'http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt' | xxd | head
0000000: fffe 3100 0d00 0a00 3000 3000 3a00 3000  ..1.....0.0.:.0.
0000010: 3000 3a00 3200 3400 2c00 3400 3000 3000  0.:.2.4.,.4.0.0.
0000020: 2000 2d00 2d00 3e00 2000 3000 3000 3a00   .-.-.>. .0.0.:.
0000030: 3000 3000 3a00 3300 3300 2c00 3100 3400  0.0.:.3.3.,.1.4.
0000040: 3000 0d00 0a00 4d00 d71e 6900 2000 6b00  0.....M...i. .k.
0000050: 6800 6900 2000 6100 6e00 6800 2000 7400  h.i. .a.n.h. .t.
0000060: 7200 f400 6e00 6700 2000 7400 6800 a51e  r...n.g. .t.h...
0000070: 7900 2000 6b00 6800 7500 f400 6e00 2000  y. .k.h.u...n. .
0000080: 6d00 b71e 7400 2000 6500 6d00 2c00 2000  m...t. .e.m.,. .
0000090: 7400 6800 bf1e 2000 6700 6900 6100 6e00  t.h... .g.i.a.n.

It starts with 0xff 0xfe , which is the byte order mark for UTF-16 Little Endian. 它以0xff 0xfe ,这是UTF-16 Little Endian的字节顺序标记 This information should really be provided in the file's HTTP headers, but apparently not in this case. 该信息确实应在文件的HTTP标头中提供,但显然在这种情况下不提供。

You can use PHP's mb_convert_encoding() function to change the file's content into whatever character set you're using for your website. 您可以使用PHP的mb_convert_encoding()函数将文件的内容更改为您用于网站的任何字符集。 For example, this will convert it into utf-8: 例如,这会将其转换为utf-8:

$src = file_get_contents('http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt');
$utf8src = mb_convert_encoding($src,'UTF-8','UTF-16LE');
header('Content-Type: text/plain; charset=utf-8');
die($utf8src);

However, the file doesn't contain JSON data. 但是,该文件不包含JSON数据。 Here are the first few lines: 以下是前几行:

1
00:00:24,400 --> 00:00:33,140
Mỗi khi anh trông thấy khuôn mặt em, thế gian này như chợt tan biến

2
00:00:33,140 --> 00:00:42,700
Tất cả đều phơi bày trong một ánh nhìn thoáng qua

当您回显jsonDate时,请使用utf8_encode:

echo(utf8_encode($jsonData));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM