简体   繁体   English

用于从字符串中删除所有带有内容和 html 代码的标签的正则表达式

[英]regular expression to remove all tags with content and html code from a string

I am looking to develop a regular expression which remove all html tags with the names, script tags, all content in the script tag (basically all javascript code), and any html code like etc just no html or javascript code in the the string should pass.我正在寻找开发一个正则表达式,它删除所有 html 标签,其中包含名称、脚本标签、脚本标签中的所有内容(基本上所有 javascript 代码),以及任何 html 代码等,只是字符串中没有 html 或 javascript 代码应该经过。 Update:更新:

I think the questioned was not so clear may be this should be more clear.我认为被质疑的不是那么清楚,可能这应该更清楚。

i want the '<' and '>' to be NOT allowed in the string along with any special characters like ;,#... etc. I dont care if there is a tag like "<html>" or "<body> " etc" I just want to return false so that user cannot enter any tag at all, also I want to block all javascript so I am assuming if I dont allow the <,> the script tag wont pass and js code wont pass?我希望字符串中不允许使用'<' and '>'以及任何特殊字符,例如;,#...等。我不在乎是否有"<html>" or "<body> "类的标签"<html>" or "<body> "等”我只想返回 false 以便用户根本无法输入任何标签,我还想阻止所有 javascript 所以我假设如果我不允许<,>脚本标签不会通过并且 js 代码不会通过?

So the regex should just not allow inclusion of any <, > and other special charaters like ;#@$%& etc so that other html code apart from tags is also blocked... eg &nbsp;因此,正则表达式应该不允许包含任何 <、> 和其他特殊字符,如;#@$%& etc ,以便除标签之外的其他 html 代码也被阻止...例如&nbsp;

For validating if an HTML element or a String contains HTML tags, check the following JavaScript function:要验证 HTML 元素或字符串是否包含 HTML 标签,请检查以下 JavaScript function:

function containsHTMLTags(str)
{
        if(str.match(/([\<])([^\>]{1,})*([\>])/i)==null)
         return false;
        else
         return true;
}

The function uses black-list filtering. function 使用黑名单过滤。

References: http://www.hscripts.com/scripts/JavaScript/html-tag-validation.php参考资料: http://www.hscripts.com/scripts/JavaScript/html-tag-validation.php

^[^<>;#]*$

if string matches that regex it doesn't contains the characters in brackets.如果字符串与该正则表达式匹配,则它不包含括号中的字符。 I hope I understand your question well.我希望我能很好地理解你的问题。

Don't use a regular expression for that.不要为此使用正则表达式。

You can't use textContent or innerText because at least the former returns the body of script elements.您不能使用textContentinnerText ,因为至少前者会返回script元素的主体。

If I was only supporting newer browsers and had access to (or shimmed ) Array.prototype.indexOf() , Array.prototype.reduce() and Array.prototype.map() , here is what I might use...如果我只支持较新的浏览器并且可以访问(或填充Array.prototype.indexOf()Array.prototype.reduce()Array.prototype.map() ,这就是我可能使用的...

var getText = function me(node, excludeElements) {

    if (!excludeElements instanceof Array) {
        excludeElements = [];
    } else {
        excludeElements.map(function(element) {
            return element.toLowerCase();
        });
    }

    return [].slice.call(node.childNodes).reduce(function(str, node) {
        var nodeType = node.nodeType;
        switch (nodeType) {
        case 3:
            return str + node.data;
        case 1:
            if (excludeElements.indexOf(node.tagName.toLowerCase()) == -1) {
                return str + me(node, excludeElements);
            }
        }
        return '';
    }, '');

}

jsFiddle . js小提琴

Regex.Replace(html, @"] >[\s\S] ?|<[^>]+>", "", RegexOptions.IgnoreCase).Trim(); Regex.Replace(html, @"] >[\s\S] ?|<[^>]+>", "", RegexOptions.IgnoreCase).Trim();

here html is a string having the html of a page from which it need to remove html and script tags这里 html 是一个字符串,其中包含需要从中删除 html 和脚本标签的页面的 html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM