简体   繁体   English

simplehtmldom类和图像

[英]simplehtmldom class and image

I am using simplehtmldom class to get all images from a website, 我正在使用simplehtmldom类从网站获取所有图像,

i am trying to get width and height of image returned by simplehtmldom, 我试图获取由simplehtmldom返回的图像的宽度和高度,

what i am trying to accomplish here is, if a image width less than 50px, i don't want the image to be displayed. 我要在这里完成的是,如果图像宽度小于50px,我不希望显示图像。

I tried getimagesize(), however its often keep on timeout i think due to amount of images. 我尝试了getimagesize(),但是由于图像数量的原因,它经常保持超时。

Any idea? 任何想法?

Thanks. 谢谢。

Using getimagesize() is very slow, especially if you're scraping a site and get many images. 使用getimagesize()的速度非常慢,尤其是当您正在抓取网站并获取许多图像时。 PHP has to download the entirety of each image BEFORE it can pass the data to getimagesize() , so if you're working on (for instance) a large photo gallery, you could be downloading many megabytes per image. PHP必须先下载每个图像的全部内容,然后才能将数据传递给getimagesize() ,因此,如果您正在(例如)大型照相馆,则每个图像可以下载许多兆字节。

There's a few things you can do to speed up the process: 您可以采取一些措施来加快这一过程:

  1. check the height/width attributes of the <img> tag and only grab images where either's larger than 50. They might not necessarily be accurate, as the web page creator could be stretching or shrinking the image, but it would save you from downloading accurately sized small images. 检查<img>标记的height / width属性,仅获取大于50的图像。由于网页创建者可能会拉伸或缩小图像,它们不一定是准确的,但可以避免准确下载尺寸的小图像。

  2. Instead of fetching the images directly with getimagesize() you could try to fetch only the first couple hundred bytes of each, which will contain the image header information. 与其直接使用getimagesize()直接获取图像,您可以尝试仅获取每个图像的前几百个字节,其中将包含图像头信息。 For GIF/JPEG images, the height/width will be very near the beginning on the file, so you'd save on file transfer overhead. 对于GIF / JPEG图像,高度/宽度将非常靠近文件的开头,因此您可以节省文件传输的开销。

  3. Increase your script's execution time. 增加脚本的执行时间。 Fetching all the images will naturally be a fairly slow process, and you'll most likely run up against PHP's max_execution_time 自然地,获取所有图像将是一个相当缓慢的过程,并且您很可能会遇到PHP的max_execution_time

comment followup: 评论跟进:

Well, if there's no height/width, then you can jump straight to fetching the image (or first bit of the image) and extracting height/width directly. 好吧,如果没有高度/宽度,那么您可以直接跳到获取图像(或图像的第一位)并直接提取高度/宽度。 Checking the height/width in the tag is just to save you the trouble of having to fetch the image in the first place. 检查标签中的高度/宽度仅是为了节省您首先要获取图像的麻烦。

As for extracting the height/width from the HTML, it's just a matter of using ->getAttribute('width') and ->getAttribute('height') calls once you've found an <img> tag with the SimpleHTMLDOM. 至于从HTML中提取高度/宽度,只要在SimpleHTMLDOM中找到<img>标签,就可以使用->getAttribute('width')->getAttribute('height')调用。 Something like this: 像这样:

$dom = file_get_html('http://example.com/somepage.html');
$images = $dom->find('img');

foreach($images as $img) {
    $h = $img->getAttribute('height');
    $w = $img->getAttribute('width');

    if (isnull($h) || (isnull($w)) {
       // height and/or width not available in tag, so fetch image and get size that way
       $h = ...
       $w = ...
    }

    if (($h >= 50) && ($w >= 50)) {
        // image is bigger than 50x50, so display it...
    }
}

This probably won't work if you cut/paste it, just doing off the top of my head, but it should be enough to get you started. 如果仅剪切掉我的脑袋,这可能就行不通了,但这足以使您入门。

It is difficult to help you since you didn't post any source code that you are using. 这是很难帮助你,因为你没有张贴您使用任何源代码。

You should know that the height and width attributes won't necessarily be in the HTML, therefore simplehtmldom won't be useful to you. 您应该知道height和width属性不一定在HTML中,因此simplehtmldom对您没有用。 You will need to use something else for this. 您将需要使用其他方式。 You are on the right track with getimagesize() . 使用getimagesize()使您处在正确的轨道上。 This function could timeout if the host you are trying to reach isn't reachable. 如果您尝试访问的主机无法访问,则此功能可能会超时。 You need to appropriately handle this with set_time_limit() . 您需要使用set_time_limit()适当地处理它。 You should also be catching when getimagesize() returns 0. 当getimagesize()返回0时,您也应该捕获。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM