简体   繁体   中英

Extrating <img> tags from HTML using Javascript

I have an html page that has many elements (tables, divs etc) I get them as string and I want to extract this format <img src="(whatever char).jpg" (whatever char)> from the string, I tried regexp tutorials but couldn't do anything as it was to complicated to me. And I need only the first occurrence, thanks.

I don't think regex is the right way to go about this:

var all_images = document.getElementsByTagName('img');
var filtered_images = [];

for (var i = 0; i < all_images.length; i++) {
    var image = all_images[i];

    if (image.hasAttribute('src')) {
        filtered_images.push(image);
    }
}

If you were using jQuery , the code would be much simpler:

var images = $('img[src]');

Is this your need?

"<img src=\\"aaa.jpg\\" (whatever char)>".match(/src="([^"]*)"/)[1]

Agree with Blender, use dom instead, regexp is not a good solution.

Obligatory link to the answer why you should think twice about using regular expressions to parse HTML: RegEx match open tags except XHTML self-contained tags

That being said, I wonder why you have a websites HTML code as a string and not as a DOM tree and need to manipulate it in javascript. That looks like a quite uncommon use-case. When your script runs on the website you want to parse, you can use document.getElementsByTagName("img") to get an array of all image DOM nodes on the website. But when you really have the sourcecode of ANOTHER website as a string and want to parse it, try this regular expression.

<img.*?src="(.*?)"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM