[英]How to replace all html tags from <anything> to \n<anything>\n [using regexp (JavaScript)]
How to replace all HTML tags from <anything>
to \\n<anything>
and </anything>
to <anything>\\n
如何将所有HTML标记从
<anything>
替换为\\n<anything>
和</anything>
替换为<anything>\\n
var text = "<anything>welcome</anything><anything>Hello</anything>";
result 结果
var text = "\n<anything>welcome</anything>\n\n<anything>Hello</anything>\n";
This code will help you (match all tags) 此代码将帮助您(匹配所有标签)
</?\w+((\s+\w+(\s*=\s*(?:".*?"|'.*?'|[^'">\s]+))?)+\s*|\s*)/?>
You can prettify xml without regex: 您可以在不使用正则表达式的情况下美化xml:
var text = "<anything>welcome</anything><anything>Hello</anything>";
var xml = new XML("<root>" + text + "</root>");
console.log(xml.children().toXMLString());
output: 输出:
<anything>welcome</anything>
<anything>Hello</anything>
Just don't parse HTML using regex. 只是不要使用正则表达式解析HTML。 Read this: http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
阅读此: http : //www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html
In JavaScript, you can turn HTML into DOM using the .innerHTML
property, and after that you can use other DOM methods to traverse it. 在JavaScript中,您可以使用
.innerHTML
属性将HTML转换为DOM,然后再使用其他DOM方法来遍历它。
Simple example (needs Firebug): 简单示例(需要Firebug):
var div = document.createElement('div');
var html = '<p>foo <span>bar</span><br /></p>';
div.innerHTML = html;
function scan(node, depth)
{
depth = depth || 0;
var is_tag = node.nodeType == 1;
var self_contained = false;
if (is_tag) {
self_contained = node.childNodes.length == 0;
var tag_name = node.tagName.toLowerCase();
console.log('<' + tag_name + (self_contained ? ' /' : '') + '>', depth);
} else {
console.log(node.data);
}
for (var i = 0, n = node.childNodes.length; i < n; i++) {
scan(node.childNodes[i], depth + 1);
}
if (!self_contained && is_tag) {
console.log('</' + tag_name + '>', depth);
}
}
scan(div);
Output: 输出:
<div> 0
<p> 1
foo
<span> 2
bar
</span> 2
<br /> 2
</p> 1
</div> 0
You could also modify this to output attributes and use the depth argument for indentation. 您也可以修改它以输出属性,并使用depth参数进行缩进。
Try this: 尝试这个:
str.replace(/<(\/?)[a-zA-Z]+(?:[^>"']+|"[^"]*"|'[^']*')*>/g, function($0, $1) {
return $1 === "/" ? $0+"\n" : "\n"+$0;
})
Expanding on @Amarghosh's answer: 扩展@Amarghosh的答案:
Assuming the HTML you are trying to parse is more complicated than your example (which I would guess it is) you may want to convert your HTML page into XHTML. 假设您要解析的HTML比示例(我想是)要复杂得多,那么您可能希望将HTML页面转换为XHTML。 This will allow you to use treat it as XML and do a number of things including:
这将使您可以将其视为XML并执行许多操作,包括:
I have done this in the past with a free .NET library called SGML . 我过去使用一个名为SGML的免费.NET库来完成此操作。
text = text.replace(/<(?!\/)/g, "\n<"); // replace every < (which are not followed by /) by \n<
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.