简体   繁体   English

来自网址的PHP Preg_match图片

[英]PHP Preg_match Image from url

I am trying to parse a website and grab the name or url of the image. 我正在尝试解析一个网站并获取图像的名称或网址。

Example URL: http://www.theworkingmanstore.com/georgia-gr14-infants-romeo.aspx 示例网址: http : //www.theworkingmanstore.com/georgia-gr14-infants-romeo.aspx

There are 6 images or more in a single <td> and I only want to get the first img src in that <td> . 一个<td>有6张图像或更多,我只想在该<td>获得第一个img src。

I am sure it can probably be done with Dom Parser, but I have no experience with it. 我确信可以使用Dom Parser完成此操作,但是我没有经验。

Any assistance would be appreciated. 任何援助将不胜感激。

Thanks 谢谢

$html = file_get_contents($url);
$reg = '/img src=["\']?([^"\' ]*)["\' ]/';
preg_match_all($reg, $html, $m);
$arr = array_map(function($v){
return trim(str_replace(array('img src=', 'http://www.theworkingmanstore.com'), '', $v), '"');}, $m[0]);
print_r($arr)

Output: This is output from regex 输出:这是正则表达式的输出

Array
(
    [0] => /images/logo2.png
    [1] => /images/mod_head_category_lt.gif
    [2] => '/images/products/display/GR14_EXTRALARGE.jpg'
    [3] => '/images/products/thumb/GR14_EXTRALARGE.jpg'
    [4] => '/images/products/thumb/GR14_8_EXTRALARGE.jpg'
    [5] => '/images/products/thumb/GR14_5_EXTRALARGE.jpg'
    [6] => '/images/products/thumb/GR14_3_EXTRALARGE.jpg'
    [7] => '/images/products/thumb/GR14_42_EXTRALARGE.jpg'
    [8] => '/images/products/thumb/GR14_2_EXTRALARGE.jpg'
    [9] => /images/freeshipping.jpg
    [10] => /images/facebook_32.png
    [11] => images/twitter_32.png
    [12] => images/googleplus_32.png
    [13] => images/pinterest_32.png
    [14] => /images/payments.gif
    [15] => /images/brands/the-working-man.jpg
)

Tried the Dom Parser suggestion: 尝试了Dom Parser的建议:

$html = file_get_contents($url) ;
$dom = new DOMDocument();
$dom->loadHtml($html);    
$xpath = new DOMXPath($dom);
echo $xpath->evaluate(
'string(//td/a[@id = "Zoomer"]/descendant::img[1]/@src)'
);

Got Error as output: Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag nav invalid in Entity 输出错误:警告:DOMDocument :: loadHTML()[domdocument.loadhtml]:实体中的标签导航无效

In DOM anything is a node, the img elements and the src attributes, too. 在DOM中,任何东西都是节点, img元素和src属性。 XPath allows you to fetch nodes lists from a DOM. XPath允许您从DOM中获取节点列表。

$dom = new DOMDocument();
$dom->loadHtml($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->evaluate('//img/@src') as $src) {
  echo $src->value, "\n";
}

Output: 输出:

http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg
http://www.theworkingmanstore.com/images/products/detail/GR14_EXTRALARGE.jpg
/images/products/thumb/GR14_EXTRALARGE.jpg
/images/products/thumb/GR14_8_EXTRALARGE.jpg
/images/products/thumb/GR14_5_EXTRALARGE.jpg
/images/products/thumb/GR14_3_EXTRALARGE.jpg
/images/products/thumb/GR14_42_EXTRALARGE.jpg
/images/products/thumb/GR14_2_EXTRALARGE.jpg

XPath allows quit complex conditions. XPath允许退出复杂条件。 The following examples outputs the src attributes of the first img inside any td . 以下示例在任何td内输出第一个imgsrc属性。

$dom = new DOMDocument();
$dom->loadHtml($html);    
$xpath = new DOMXPath($dom);

foreach ($xpath->evaluate('//td/descendant::img[1]/@src') as $src) {
  echo $src->value, "\n";
}

Output: 输出:

http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg

The HTML in the question contains only a single td , and more important the img is inside an a element with an id attribute. 问题中的HTML仅包含一个td ,更重要的是img位于具有id属性a元素内。 So it has to be a single unique value. 因此,它必须是一个唯一的值。 This allows it to cast the node list directly in XPath and return it as a string. 这允许它直接在XPath中强制转换节点列表,并将其作为字符串返回。

$dom = new DOMDocument();
$dom->loadHtml($html);    
$xpath = new DOMXPath($dom);
echo $xpath->evaluate(
  'string(//td/a[@id = "Zoomer"]/descendant::img[1]/@src)'
);

Output: 输出:

http://www.theworkingmanstore.com/images/products/display/GR14_EXTRALARGE.jpg

You can try using this regex. 您可以尝试使用此正则表达式。

$html = 'Your HTML';
$reg = '/img src=["\']?([^"\' ]*)["\' ]/';
preg_match_all($reg, $html, $m);
$arr = array_map(function($v){
    return trim(str_replace(array('img src=', 'http://www.theworkingmanstore.com'), '', $v), '"');
}, $m[0]);

print '<pre>';
print_r($arr);
print '</pre>';

Output: 输出:

Array
(
    [0] => /images/products/display/GR14_EXTRALARGE.jpg
    [1] => /images/products/detail/GR14_EXTRALARGE.jpg
    [2] => /images/products/thumb/GR14_EXTRALARGE.jpg
    [3] => /images/products/thumb/GR14_8_EXTRALARGE.jpg
    [4] => /images/products/thumb/GR14_5_EXTRALARGE.jpg
    [5] => /images/products/thumb/GR14_3_EXTRALARGE.jpg
    [6] => /images/products/thumb/GR14_42_EXTRALARGE.jpg
    [7] => /images/products/thumb/GR14_2_EXTRALARGE.jpg
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM