简体   繁体   中英

Matching SRC attribute of IMG tag using preg_match

I'm attempting to run preg_match to extract the SRC attribute from the first IMG tag in an article (in this case, stored in $row->introtext).

preg_match('/\< *[img][^\>]*[src] *= *[\"\']{0,1}([^\"\']*)/i', $row->introtext, $matches);

Instead of getting something like

images/stories/otakuzoku1.jpg

from

<img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku's store" />

I get just

0

The regex should be right, but I can't tell why it appears to be matching the border attribute and not the src attribute.

Alternatively, if you've had the patience to read this far without skipping straight to the reply field and typing 'use a HTML/XML parser', can a good tutorial for one be recommended as I'm having trouble finding one at all that's applicable to PHP 4.

PHP 4.4.7

Your expression is incorrect. Try:

preg_match('/< *img[^>]*src *= *["\']?([^"\']*)/i', $row->introtext, $matches);

Note the removal of brackets around img and src and some other cleanups.

Here's a way to do it with built-in functions (php >= 4):

$parser = xml_parser_create();
xml_parse_into_struct($parser, $html, $values);
foreach ($values as $key => $val) {
    if ($val['tag'] == 'IMG') {
        $first_src = $val['attributes']['SRC'];
        break;
    }
}

echo $first_src;  // images/stories/otakuzoku1.jpg

如果你需要使用preg_match()本身,试试这个:

 preg_match('/(?<!_)src=([\'"])?(.*?)\\1/',$content, $matches);

Try:

include ("htmlparser.inc"); // from: http://php-html.sourceforge.net/

$html = 'bla <img src="images/stories/otakuzoku1.jpg" border="0" alt="Inside Otakuzoku\'s store" /> noise <img src="das" /> foo';

$parser = new HtmlParser($html);

while($parser->parse()) {
    if($parser->iNodeName == 'img') {
        echo $parser->iNodeAttributes['src'];
        break;
    }
}

which will produce:

images/stories/otakuzoku1.jpg

It should work with PHP 4.x.

The regex I used was much simpler. My code assumes that the string being passed to it contains exactly one img tag with no other markup:

$pattern = '/src="([^"]*)"/';

See my answer here for more info: How to extract img src, title and alt from html using php?

This task should be executed by a dom parser because regex is dom-ignorant.

Code: ( Demo )

$row = (object)['introtext' => '<div>test</div><img src="source1"><p>text</p><img src="source2"><br>'];

$dom = new DOMDocument();
$dom->loadHTML($row->introtext);
echo $dom->getElementsByTagName('img')->item(0)->getAttribute('src');

Output:

source1

This says:

  1. Parse the whole html string
  2. Isolate all of the img tags
  3. Isolate the first img tag
  4. Isolate its src attribute value

Clean, appropriate, easy to read and manage.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM