I'm parsing internet newspapers's columinst page. I have problem about this site
the parsing was working fine in the starting but it stopped working.
Here's my code
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL,$gazeteAdress);
//curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'mozilla');
$query = curl_exec($curl_handle);
curl_close($curl_handle);
$html = new simple_html_dom();
$html->load($query);
I don't know why my code sometimes is not parsing the site, so I was thinking about connection_timeout. But It is not the problem, so I was thinking of printing html page with curl instead.
echo $html;
Here is result. (sometimes my code is not parsing html page properly)
why the html tags are not coming and why am seeing the result like this. Can anyone help ?
The content is returned compressed so you should specify Accept-Encoding with 'gzip,deflate' header for curl.
Please add this line
curl_setopt($curl_handle, CURLOPT_ENCODING, "gzip,deflate");
after this
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'mozilla');
将其添加到您的php脚本之上
header('Content-Type: text/html; charset=utf-8');
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.