简体   繁体   中英

Grab all html tags from string including their content (Regex Only)

I am trying to get all html tags, without exception, from a string. Just to clarify, it needs to be strictly string only, without converting into html object. I created one regex but it only grabs the tags without the content.

 var text = '<div class="mura-region-local"><p>In October 2010, Lisa and Eugene Jeffers learned that their daughter Jade, then nearly 2 and a half years old, has autism. The diagnosis felt like a double whammy. The parents were soon engulfed by stress from juggling Jade's new therapy appointments and wrangling with their health insurance provider, but they now had an infant son to worry about, too. Autism runs in families. Would Bradley follow in his big sister's footsteps?</p></div><img href=""/>' var match = text.match(/<?\\w+((\\s+\\w+(\\s*=\\s*(?:".*?"|'.*?'|[\\^'">\\s]+))?)+\\s*|\\s*)?>/g); console.log(match);

You can't find pairs of <smth>...</smth> for all possible tags. You can't make regex that will recognize tagA inside tagB and tagB in tagA for all tags, too. You must write all these combinations directly, and that makes such regex impossible.

But if you mean that you want to take only <smth ....> , </smth> and <smth..../> tags without checking the correct order of them, it IS possible.

<(?:\w+(?:\s+\w+=(?:"[^"]*"|'[^']*'))*\/?|(?:\/\w+))>

Here is the test.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM