I am trying to parse a website and grab the name or url of the image.
Example URL: http://www.theworkingmanstore.com/georgia-gr14-infants-romeo.aspx
There are 6 images or more in a single <td>
and I only want to get the first img src in that <td>
.
I am sure it can probably be done with Dom Parser, but I have no experience with it.
Any assistance would be appreciated.
Thanks
$html = file_get_contents($url);
$reg = '/img src=["\']?([^"\' ]*)["\' ]/';
preg_match_all($reg, $html, $m);
$arr = array_map(function($v){
return trim(str_replace(array('img src=', 'http://www.theworkingmanstore.com'), '', $v), '"');}, $m[0]);
print_r($arr)
Output: This is output from regex
Array ( [0] => /images/logo2.png [1] => /images/mod_head_category_lt.gif [2] => '/images/products/display/GR14_EXTRALARGE.jpg' [3] => '/images/products/thumb/GR14_EXTRALARGE.jpg' [4] => '/images/products/thumb/GR14_8_EXTRALARGE.jpg' [5] => '/images/products/thumb/GR14_5_EXTRALARGE.jpg' [6] => '/images/products/thumb/GR14_3_EXTRALARGE.jpg' [7] => '/images/products/thumb/GR14_42_EXTRALARGE.jpg' [8] => '/images/products/thumb/GR14_2_EXTRALARGE.jpg' [9] => /images/freeshipping.jpg [10] => /images/facebook_32.png [11] => images/twitter_32.png [12] => images/googleplus_32.png [13] => images/pinterest_32.png [14] => /images/payments.gif [15] => /images/brands/the-working-man.jpg )
Tried the Dom Parser suggestion:
$html = file_get_contents($url) ;
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
echo $xpath->evaluate(
'string(//td/a[@id = "Zoomer"]/descendant::img[1]/@src)'
);
Got Error as output: Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nav invalid in Entity
In DOM anything is a node, the img
elements and the src
attributes, too. XPath allows you to fetch nodes lists from a DOM.
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//img/@src') as $src) {
echo $src->value, "\n";
}
Output:
http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg
http://www.theworkingmanstore.com/images/products/detail/GR14_EXTRALARGE.jpg
/images/products/thumb/GR14_EXTRALARGE.jpg
/images/products/thumb/GR14_8_EXTRALARGE.jpg
/images/products/thumb/GR14_5_EXTRALARGE.jpg
/images/products/thumb/GR14_3_EXTRALARGE.jpg
/images/products/thumb/GR14_42_EXTRALARGE.jpg
/images/products/thumb/GR14_2_EXTRALARGE.jpg
XPath allows quit complex conditions. The following examples outputs the src
attributes of the first img
inside any td
.
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//td/descendant::img[1]/@src') as $src) {
echo $src->value, "\n";
}
Output:
http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg
The HTML in the question contains only a single td
, and more important the img
is inside an a
element with an id
attribute. So it has to be a single unique value. This allows it to cast the node list directly in XPath and return it as a string.
$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
echo $xpath->evaluate(
'string(//td/a[@id = "Zoomer"]/descendant::img[1]/@src)'
);
Output:
http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg
You can try using this regex.
$html = 'Your HTML';
$reg = '/img src=["\']?([^"\' ]*)["\' ]/';
preg_match_all($reg, $html, $m);
$arr = array_map(function($v){
return trim(str_replace(array('img src=', 'http://www.theworkingmanstore.com'), '', $v), '"');
}, $m[0]);
print '<pre>';
print_r($arr);
print '</pre>';
Output:
Array
(
[0] => /images/products/display/GR14_EXTRALARGE.jpg
[1] => /images/products/detail/GR14_EXTRALARGE.jpg
[2] => /images/products/thumb/GR14_EXTRALARGE.jpg
[3] => /images/products/thumb/GR14_8_EXTRALARGE.jpg
[4] => /images/products/thumb/GR14_5_EXTRALARGE.jpg
[5] => /images/products/thumb/GR14_3_EXTRALARGE.jpg
[6] => /images/products/thumb/GR14_42_EXTRALARGE.jpg
[7] => /images/products/thumb/GR14_2_EXTRALARGE.jpg
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.