简体   繁体   中英

cURL font encoding-error

I want to get contents via cURL from this page .

Here is my code:

$url = $_GET["url"];
$url = str_replace(" ", "%20", $url);
$curlSession = curl_init();
curl_setopt($curlSession, CURLOPT_URL, $url);
curl_setopt($curlSession, CURLOPT_BINARYTRANSFER, true);
curl_setopt($curlSession, CURLOPT_RETURNTRANSFER, true);
$jsonData = curl_exec($curlSession);
curl_close($curlSession);
if (strpos($url, "toomva.com") >= 0) {
     $jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
if (strpos($url, "Toomva -") >= 0){
    $jsonData = str_replace("toomva.com", "http://av.bsquochoai.ga ⇔ ", $jsonData);
}
echo($jsonData);

Here you can find a live demo.

My problem is that the returned text is not as I expect. It has a lot of :

1 0 0 : 0 0 : 2 4 , 4 0 0 - - > 0 0 : 0 0 : 3 3 , 1 4 0 M i k h i a n h t r n g t h y k h u n m t e m , t h g i a n n y n h c h t t a n b i n

Can you please help me with this?

Here are the first few bytes of the file you're trying to access:

$ curl -s 'http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt' | xxd | head
0000000: fffe 3100 0d00 0a00 3000 3000 3a00 3000  ..1.....0.0.:.0.
0000010: 3000 3a00 3200 3400 2c00 3400 3000 3000  0.:.2.4.,.4.0.0.
0000020: 2000 2d00 2d00 3e00 2000 3000 3000 3a00   .-.-.>. .0.0.:.
0000030: 3000 3000 3a00 3300 3300 2c00 3100 3400  0.0.:.3.3.,.1.4.
0000040: 3000 0d00 0a00 4d00 d71e 6900 2000 6b00  0.....M...i. .k.
0000050: 6800 6900 2000 6100 6e00 6800 2000 7400  h.i. .a.n.h. .t.
0000060: 7200 f400 6e00 6700 2000 7400 6800 a51e  r...n.g. .t.h...
0000070: 7900 2000 6b00 6800 7500 f400 6e00 2000  y. .k.h.u...n. .
0000080: 6d00 b71e 7400 2000 6500 6d00 2c00 2000  m...t. .e.m.,. .
0000090: 7400 6800 bf1e 2000 6700 6900 6100 6e00  t.h... .g.i.a.n.

It starts with 0xff 0xfe , which is the byte order mark for UTF-16 Little Endian. This information should really be provided in the file's HTTP headers, but apparently not in this case.

You can use PHP's mb_convert_encoding() function to change the file's content into whatever character set you're using for your website. For example, this will convert it into utf-8:

$src = file_get_contents('http://toomva.com/Data/subtitle/Duncan%20James%20ft.%20Keedie%20-%20I%20Believe%20My%20Heart.Vie_Syned.srt');
$utf8src = mb_convert_encoding($src,'UTF-8','UTF-16LE');
header('Content-Type: text/plain; charset=utf-8');
die($utf8src);

However, the file doesn't contain JSON data. Here are the first few lines:

1
00:00:24,400 --> 00:00:33,140
Mỗi khi anh trông thấy khuôn mặt em, thế gian này như chợt tan biến

2
00:00:33,140 --> 00:00:42,700
Tất cả đều phơi bày trong một ánh nhìn thoáng qua

当您回显jsonDate时,请使用utf8_encode:

echo(utf8_encode($jsonData));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM