简体   繁体   中英

Regex for capturing repeated groups Javascript

I have some test data in the following format -

"lorem ipsum <img src='some_url' class='some_class' /> lorem ipsum <img src='some_url' class='some_class' /> ipsum <img src='some_url' class='some_class' />"

Now, my goal is to identify all the image tags along with their respective source urls and css classes and store them together with the remaining text in an ordered array like -

["lorem ipsum", {imageObject1}, "lorem ipsum", {imageObject2}, "ipsum", {imageObject3}]

Now for this I tried to create a sample regex

var regex = /(.*(<img\s+src=['"](.+)['"]\s+(class=['"].+['"])?\s+\/>)+?.*)+/ig

Now when I try this regex with the sample text i am getting -

regex.exec(sample_text) => [0:"lorem ipsum <img src='some_url1' class='some_class1' /> lorem ipsum <img src='some_url2' class='some_class2' /> ipsum <img src='some_url3' class='some_class3' />"
1:"lorem ipsum <img src='some_url1' class='some_class1' /> lorem ipsum <img src='some_url2' class='some_class2' /> ipsum <img src='some_url3' class='some_class3' />"
2:"<img src='some_url3' class='some_class3' />"
3:"some_url3"
4:"class='some_class3'"]

How in javascript can I transform the sample html text into an array of tagged html objects with their attributes.

Do not use regular expressions to parse HTML . Use a DOMParser to parse the string and then CSS queries to get the images from the DOM, it will be much more reliable and easier to read.

var html = "lorem ipsum <img src='some_url' class='some_class' /> lorem ipsum <img src='some_url' class='some_class' /> ipsum <img src='some_url' class='some_class' />"

var nodes = new DOMParser().parseFromString(html, "text/html").body.childNodes

That will get you almost what you wanted (just some empty Text nodes you can filter out).

Or do something a little bit more accurate like this in case you don't have just images and text in the HTML:

var images = new DOMParser().parseFromString(html, "text/html").querySelectorAll("img")
var array = new Map([...images].map(img => [img.previousSibling.nodeValue, img]))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM