简体   繁体   English

正则表达式从HTML提取图像-如何仅获取JPG?

[英]Regex to extract images from HTML - how to get only JPGs?

I am using this PHP function to grab all <img> tags within any given HTML. 我正在使用此PHP函数来获取任何给定HTML中的所有<img>标签。

function extract_images($content)
{
    $img    = strip_tags(html_entity_decode($content),'<img>');
    $regex  = '~src="[^"]*"~';    

    preg_match_all($regex, $img, $all_images);

    return $all_images;
}

This works and returns all images (gif, png, jpg, etc). 这有效并返回所有图像(gif,png,jpg等)。

Anyone know how to change the regex... 任何人都知道如何更改正则表达式...

~src="[^"]*"~

in order to only get files with JPG or JPEG extension? 为了只获取带有JPG或JPEG扩展名的文件?

Thanks a bunch. 谢谢你

Sooner or later the Regex Enforcement Agency will show up. Regex执法机构迟早会出现。 It might as well be me :) 也可能是我:)

The proper way to do this is with a proper HTML DOM parser. 正确的方法是使用适当的HTML DOM解析器。 Here's a DOMDocument solution. 这是DOMDocument解决方案。 The usefulness of this is in that it's more robust than parsing the HTML by regex, and also gives you the ability to access or modify other HTML attributes on your <img> nodes at the same time. 这样做的用处在于,它比通过正则表达式解析HTML更健壮,并且还使您能够同时访问或修改<img>节点上的其他HTML属性。

$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

// Get all images
$imgs = $dom->getElementsByTagName("img");
foreach($imgs as $img) {
  // Check the src attr of each img
  $src = "";
  $src = $img->getAttribute("src");
  if (preg_match("/\.jp[e]?g$/i", $src) {

    // Add it onto your $links array.
    $links[] = $src;
}

See other answers for the simple regex solution, or adapt from the regex inside my foreach loop. 有关简单的正则表达式解决方案,请参见其他答案,或者从我的foreach循环中的正则表达式改编。

/src="[^"]*\.(jpg|jpeg)"/i

我->不区分大小写的匹配

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM