[英]How can I find HTML like tags in a string using Javascript?
I have the following string:我有以下字符串:
var originalStr = "Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end."
What's the best way to identify all tags, the correspondent tag name and their content?识别所有标签、对应标签名称及其内容的最佳方法是什么? This is the kind of result I'm looking for.这就是我正在寻找的结果。
var tagsFound =
[ { "tagName": "firstTag", "value": "text inside first tag" }
, { "tagName": "secondTag", "value": "50" }
]
HTML is very complicated to parse, so the best approach is to use a parser that already exists. HTML 解析非常复杂,所以最好的方法是使用已经存在的解析器。
If you're doing this in a browser, you can use the one built into the browser: DOMParser
.如果您在浏览器中执行此操作,则可以使用浏览器中内置的一个: DOMParser
。
If you're doing this in Node.js, there are several libraries to do it, such as jsdom
.如果您在 Node.js 中执行此操作,则有几个库可以执行此操作,例如jsdom
。 It provides an API almost identical to the one in web browsers.它提供了一个 API 几乎与 web 浏览器中的相同。
Here's a jsdom
example:这是一个jsdom
示例:
const dom = new JSDOM("<!doctype html>" + originalStr);
const doc = dom.window.document;
for (const childElement of doc.body.children) {
console.log(`${childElement.tagName} - ${childElement.textContent}`);
}
With your string, that would output:使用您的字符串,那将是 output:
FIRSTTAG - text inside first tag
SECONDTAG - 50
You'd write code using the DOM methods provided to create the output you're looking for.您将使用提供的 DOM 方法编写代码来创建您正在寻找的 output。 (Note the tag name normalization above; you may have to use nodeLocation
to get the original capitalization if it matters to what you're doing.) (请注意上面的标签名称规范化;如果它对您正在做的事情很重要,您可能必须使用nodeLocation
来获取原始大写。)
Depending on complexity of strings you dealing with - the simple regEx solution might work (it works for your string nicely:根据您处理的字符串的复杂性 - 简单的正则表达式解决方案可能有效(它很好地适用于您的字符串:
var str = 'Test example <firstTag>text inside first tag</firstTag>, <secondTag>50</secondTag> end.'; var tagsFound = []; str.replace(/<([a-zA-Z][a-zA-Z0-9_-]*)\b[^>]*>(.*?)<\/\1>/g, function(m,m1,m2){ // write data to result objcect tagsFound.push({ "tagName": m1, "value": m2 }) // replace with original = do nothing with string return m; }); // Displaying the results for(var i=0;i<tagsFound.length; i++){ console.log(tagsFound[i]); }
There will be a problem when self closing tags or tags containing other tags are taken into accont.当自闭标签或包含其他标签的标签被考虑在内时会出现问题。 Like <selfClosedTag/>
or <tag><tag>something</tag>else</tag>
像<selfClosedTag/>
或<tag><tag>something</tag>else</tag>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.