简体   繁体   English

从字符串中获取所有 html 标签,包括它们的内容(仅限正则表达式)

[英]Grab all html tags from string including their content (Regex Only)

I am trying to get all html tags, without exception, from a string.我正在尝试从字符串中无一例外地获取所有 html 标签。 Just to clarify, it needs to be strictly string only, without converting into html object.只是为了澄清,它只需要严格的字符串,而不是转换为 html 对象。 I created one regex but it only grabs the tags without the content.我创建了一个正则表达式,但它只抓取没有内容的标签。

 var text = '<div class="mura-region-local"><p>In October 2010, Lisa and Eugene Jeffers learned that their daughter Jade, then nearly 2 and a half years old, has autism. The diagnosis felt like a double whammy. The parents were soon engulfed by stress from juggling Jade's new therapy appointments and wrangling with their health insurance provider, but they now had an infant son to worry about, too. Autism runs in families. Would Bradley follow in his big sister's footsteps?</p></div><img href=""/>' var match = text.match(/<?\\w+((\\s+\\w+(\\s*=\\s*(?:".*?"|'.*?'|[\\^'">\\s]+))?)+\\s*|\\s*)?>/g); console.log(match);

You can't find pairs of <smth>...</smth> for all possible tags.您无法为所有可能的标签找到成对的<smth>...</smth> You can't make regex that will recognize tagA inside tagB and tagB in tagA for all tags, too.对于所有标签,您也无法制作能够识别 tagB 中的 tagA 和 tagA 中的 tagB 的正则表达式。 You must write all these combinations directly, and that makes such regex impossible.您必须直接编写所有这些组合,这使得这样的正则表达式变得不可能。

But if you mean that you want to take only <smth ....> , </smth> and <smth..../> tags without checking the correct order of them, it IS possible.但是如果你的意思是你只想获取<smth ....></smth><smth..../>标签而不检查它们的正确顺序,这是可能的。

<(?:\w+(?:\s+\w+=(?:"[^"]*"|'[^']*'))*\/?|(?:\/\w+))>

Here is the test.是测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用于从字符串中删除所有带有内容和 html 代码的标签的正则表达式 - regular expression to remove all tags with content and html code from a string 如何获取 html 标签内的内容,包括在 javascript 中使用正则表达式的标签? - How to get content inside html tags including the tags using regex in javascript? 正则表达式从字符串返回所有图像标签 - regex to return all image tags from string Javascript,使用正则表达式仅替换HTML标记之外的内容 - Javascript, Use a regex to replace content outside of HTML tags only 从HTML解析所有内容并找到一个字符串并替换,但是如果匹配则无法替换标签字符串 - parse all content from HTML and find a string and replace But cannot replace tags string if match 如何删除仅带有特定类名且不包含字符串内部内容的html标签(跨度)? - How to remove html tags (span) wrap only with specific class name and without inner content from a string? PHP Regex从JSON变量中获取内容 - PHP Regex grab content from JSON vars JS Regex删除某些html标签(包括标签)之外的所有内容 - JS Regex to remove everything outside certain html tags (including tags) 将HTML标签添加到此正则表达式字符串 - Add HTML tags to this regex string 正则表达式从嵌套的 html 标签中删除所有属性 - Javascript - Regex to remove all attributes from nested html tags - Javascript
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM