I'm working with PHP Simple HTML DOM and just discovered it can't read images from data-src
attribute or <img src
without http: eg; <img src="//static.mysite.com/123.jpg">
Is there any way to make it happen?
My code is:
if($htm->find('img')){
foreach($htm->find('img') as $element) {
$raw = file_get_contents_curl($element->src);
$im = @imagecreatefromstring($raw);
$width = @imagesx($im);
$height = @imagesy($im);
if($width>500&&$height>=350){
$hasimg = '1';
echo '<img src=\'' .$element->src. '\'>';
}
} // end foreach
} // end if htm
It works for me:
$doc = str_get_html('<img data-src="foo">');
echo $doc->find('img', 0)->getAttribute('data-src');
//=> outputs: foo
echo $htm->find('img', 0)->getAttribute('data-src');
If you're using file_get_contents_curl()
as a function you defined in your code, like the one in this question , you need to set the default protocol to use for cURL:
curl_setopt($ch, CURLOPT_PROTOCOLS, CURLPROTO_HTTP);
That way, if the image src
attribute has a protocol relative URL, cURL will just use HTTP.
Leaving out the protocol (http/https) is called "network path reference" and means that the protocol of the page the URL is embedded in should be used. This makes no sense with file_get_contents() or curl, because they are not aware of any page.
Long story short, you have to add the protocol yourself.
Try this:
$url=$element->src;
if (substr($url, 0, 2)=='//') $url='http:'.$url;
$raw=file_get_contents_curl($url);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.