简体   繁体   English

如何仅获取html标签?

[英]How can I get only tags of html?

How can I get only HTML tags with NodeJS ? 如何使用NodeJS仅获得HTML标签?

I have this: 我有这个:

<html>
<head>
Hi
</head>
<body>
<center id="fantastic">
Hi , hello
</center>
</body>
</html>

And I want to delete Hi and Hi , Hello and get only the tags, and i want remove too the id="fantastic". 而且我想删除Hi和Hi,Hello并仅获取标签,并且我也想删除id =“ fantastic”。 Any idea? 任何想法? Any regular expression? 任何正则表达式?

Assuming you have the source HTML in a Javascript string and that it is legal HTML and the HTML attributes don't contain ">" or "<" characters, this should work: 假设您在Javascript字符串中具有源HTML,并且它是合法的HTML,并且HTML属性不包含“>”或“ <”字符,则此方法应该起作用:

var source = "your html here";

var result = source.match(/<.*?>/g).map(function(item) {
    return item.replace(/<\s+/, "<").replace(/\s.*?(\/?>)$/, "$1");
}).join("");

Working demo: http://jsfiddle.net/jfriend00/6q0gyugd/ 工作演示: http : //jsfiddle.net/jfriend00/6q0gyugd/

This uses a regex to isolate just the HTML tags into an array and then uses .map() to iterate through that array to remove any leading whitespace in the tag and then to remove any attributes from each tag, then joins them back into a string of HTML. 这使用正则表达式将HTML标记仅隔离到一个数组中,然后使用.map()遍历该数组以删除标记中的所有前导空格,然后从每个标记中删除任何属性,然后将它们重新连接成字符串HTML。


To be the most robust with any possible legal HTML, you may as well just use an actual HTML parser (which can be smarter than any regex can possibly be) to parse the actual HTML tags, then walk the parsed tree to output just the tags. 为了在任何可能的合法HTML上都具有最强的鲁棒性,您也可以只使用实际的HTML解析器(它比任何正则表达式都可能更聪明)来解析实际的HTML标签,然后遍历解析的树以仅输出标签。

您可以尝试使用像cheerio这样的库cheerio : //github.com/cheeriojs/cheerio

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从HTML文档中仅获取文本(无标记)? - How can I get the text only (no tags) from a HTML document? 如何仅为论坛呈现某些html标签? - How can I only render certain html tags for my forum? 如何在两个独立标签之间获取 HTML 元素 - How can I get HTML elements between two independent tags 如何使用javascript获取字符串中重复的DYNAMIC html标签之间的字符串? (没有正则表达式,除非它是唯一的方法!) - How can I get the string between repeated DYNAMIC html tags in a string using javascript? (No regex unless its the only way!) 正则表达式:如何在忽略 HTML 标签的 HTML 字符串中获得最后一个词? - Regex: How can I get the last word in an HTML string ignoring HTML tags? 如何在不影响使用 Nodejs 或 Javascript 的 HTML 标签的情况下从 HTML 中获取 100 到 200 个单词? - How can I get 100 to 200 words from HTML without affecting HTML tags using Nodejs or Javascript? 如何将HTML标签限制为Div? - How can I limit HTML Tags to a Div? 我如何在 javascript 中使用 html 标签 - How can i use html tags in javascript 如何重新排列 HTML 标签的顺序? - How can I rearrange the order of HTML tags? 当我尝试获取所选文本的html时,如何避免浏览器自动打开或关闭标签 - How can i avoid browser auto open or close tags when i try to get the html of the text selected
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM