简体   繁体   中英

cURL PHP strange characters

I need your help with cURL in PHP.

I'm trying to get a page and convert it in JSON but I have strange character in my cURL response :  thus I can't convert it. This characters are displayed just before the !doctype of the page that I am looking for.

I set header('Content-type: text/html; charset=utf-8'); in PHP and I used

'Accept: text/xml,application/xml,application/xhtml+xml',
        'text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
        'Accept-Language: fr-fr,fr;q=0.7,en-us;q=0.5,en;q=0.3',
        'Accept-Charset: utf-8;q=0.7,*;q=0.7',
        'Keep-Alive: 300');

for cURL.

cURL Code :

$ch = curl_init($searchUrl);

curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);        
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 20);           
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);            
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_HEADER, $header);          
curl_setopt($ch, CURLOPT_ENCODING, 'gzip,deflate');             
curl_setopt($ch, CURLOPT_USERAGENT, $agents[rand(0, count($agents) - 1)]);

$response = curl_exec($ch);

curl_close($ch);

Anyone has an idea ?

Those 3 initial characters are called a BOM mark . It's used to determine the encoding of a file. You can attempt to strip it by substringing the HTML response:

$response = substr($response, 3);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM