简体   繁体   English

使用正则表达式删除所有html属性(替换)

[英]Remove all html attributes with regex (replace)

For example i have such html: 例如我有这样的HTML:

<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper">< div class="modal-content"><div class="modal-body full long"><div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>

how could i remove all style class id etc from such html? 我如何从此类html中删除所有样式类ID?

i have such regex: 我有这样的正则表达式:

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/i

what is wrong? 怎么了? how to delete all html attributes with the help of regex? 如何在正则表达式的帮助下删除所有html属性?

here is fiddle: 这里是小提琴:

http://jsfiddle.net/qL4maxn0/1/ http://jsfiddle.net/qL4maxn0/1/

You should not use regex here. 您不应该在这里使用正则表达式。

var html = '<title>Ololo - text’s life</title><div class="page-wrap"><div class="ng-scope"><div class="modal custom article ng-scope in" id="new-article" aria-hidden="false" style="display: block;"><div class="modal-dialog first-modal-wrapper"><div class="modal-content"><div class="modal-body full long">                        <div class="form-group">olololo<ul style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);"><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li>                            </ul><p style="color: rgb(85, 85, 85);background-color: rgb(255, 255, 255);">bbcvbcvbcvbcvbcvbcvbcvb</p></div><div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div class="page-wrap"></div></div>';
var div = document.createElement('div');
div.innerHTML = html;

function removeAllAttrs(element) {
    for (var i = element.attributes.length; i-- > 0;)
    element.removeAttributeNode(element.attributes[i]);
}

function removeAttributes(el) {
    var children = el.children;
    for (var i = 0; i < children.length; i++) {
        var child = children[i];
        removeAllAttrs(child);
        if (child.children.length) {
            removeAttributes(child);
        }
    }
}
removeAttributes(div);
console.log(div.innerHTML);

Working Fiddle 工作小提琴

Source 资源

You're missing the g flag to make the replace global. 您缺少g标志来使替换全局。

/<([a-z][a-z0-9]*)[^>]*?(\/?)>/ig

Also, if you're doing this for security purposes, look into using a proper HTML sanitizer : Sanitize/Rewrite HTML on the Client Side 另外,如果出于安全目的执行此操作,请考虑使用适当的HTML清理程序: 在客户端清理/重写HTML

First of all, I would advise you not to use regexes in this situation , they are not meant to parse tree-shaped structures like HTML. 首先,我建议您在这种情况下不要使用正则表达式 ,它们并不意味着解析HTML之类的树形结构。

If you however don't have a choice, I think for the requested problem, you can use a regex. 但是,如果您别无选择,我认为对于所要求的问题,可以使用正则表达式。

Looks to me like you forgot spaces, accents, etc. You can use the fact that the greater than > and less than < signs are not allowed as raw text. 在我看来,您好像忘记了空格,重音符号等。您可以使用以下事实:不允许将大于>和小于<符号用作原始文本。

/<\s*([a-z][a-z0-9]*)\s.*?>/gi

and call it with: 并调用:

result = body.replace(regex, '<$1>')

For your given sample, it produces: 对于给定的样本,它将产生:

<title>Ololo - text’s life</title><div><div><div><div><div><div><div>olololo<ul><li>texttext</li><li>Filter the events lists by host.</li><li>Create graphs for separate hosts and for the groups of hosts.</li></ul><p>bbcvbcvbcvbcvbcvbcvbcvb</p></div></div></div></div></div></div><title>cvbcbcvbcvbcvbccb</title><div></div></div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM