I am trying to get the data-src
and the data-srcset
attributes from a string of many images in php. Both attributes are optional, that means, there can be zero, only data-src
, only data-srcset
or both. The regex I have is
<img(.*?)data-src=['\\"](.*?)['\\"].*?|(data-srcset=['\\"](.*?)['\\"])?\\/>
The string i am testing against is:
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
But it is too greedy. Look here:
https://regex101.com/r/vDQE3C/1
Any help (also logical) is very much appreciated.
Don't use regex for parsing html code. Better to use DOM
parser like this:
$html = <<< EOF
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif" alt="" data-id="1037" data-link="http://localhost:3000/detektivhut/" class="wp-image-1037"/>
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png" alt="" data-id="948" data-link="http://localhost:3000/dsc04828-2/" class="wp-image-948" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
<li class="blocks-gallery-item">
<figure>
<img data-src="http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png" alt="" data-id="883" data-link="http://localhost:3000/2018/11/13/single-page-style-1/dsc04831-2/" class="wp-image-883" data-srcset="//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w" sizes="(max-width: 1067px) 100vw, 1067px" />
</figure>
</li>
EOF;
$xpath = new DOMXPath(@DOMDocument::loadHTML($html));
$images = $xpath->evaluate("//img");
foreach($images as $img){
if (($el = $img->attributes->getNamedItem('data-src')) != null)
echo 'data-src=' . $el->nodeValue . "\n";
if (($el = $img->attributes->getNamedItem('data-srcset')) != null)
echo 'data-srcset=' . $el->nodeValue . "\n";
}
Output:
data-src=http://localhost:3000/wp-content/uploads/2018/11/detektivhut.gif
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04828.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04828.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04828-1000x1500.png 1000w
data-src=http://localhost:3000/wp-content/uploads/2018/11/DSC04831.png
data-srcset=//localhost:3000/wp-content/uploads/2018/11/DSC04831.png 1067w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-200x300.png 200w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-768x1152.png 768w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-683x1024.png 683w, //localhost:3000/wp-content/uploads/2018/11/DSC04831-1000x1500.png 1000w
You just need to account for anything that comes between the data-attributes*
and the image closing tags />
. You needed another (.*?)
.
<img(.*?)data-src=['\\"](.*?)['\\"].*?data-srcset=['\\"](.*?)['\\"](.*?)\\/>
And if you only want to capture the data-attributes*
consider using non-capturing groups like follows. So that the $1
and $2
variables contain only the data you want, and not the whole image tag.
<img(?:.*?)data-src=['\\"](.*?)['\\"].*?data-srcset=['\\"](.*?)['\\"](?:.*?)\\/>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.