[英]Regex to whitelist html tags
I'm trying to create a regex that could whitelist a few set of html tags.我正在尝试创建一个可以将几组 html 标签列入白名单的正则表达式。
/<(\/)?(code|em|ul)(\/)?>$/
But there are few cases where this is failing:但是在少数情况下会失败:
<em style="padding: 10px">
So tried /<(\\/)?(code|em|ul)(.|\\n)*?(\\/)?>$/
but this also allows所以试过/<(\\/)?(code|em|ul)(.|\\n)*?(\\/)?>$/
但这也允许
<emadchgasgh style="padding: 10px">
Cases that need to be whitelisted:需要加入白名单的案例:
<em> - Success
</em> - Success
<br/> - Success
<em style="asdcasc"> - Success
<emacjhasjdhc> - Failure
Question- What else could be added to the regex?问题 -正则表达式中还可以添加什么?
/<\s*\/?\s*(code|em|ul|br)\b.*?>/
\\s*\\/?\\s*
There may be spaces before the name of the tag \\s*\\/?\\s*
标签名前可能有空格(code|em|ul|br)\\b
Matches only the whole tag name (code|em|ul|br)\\b
只匹配整个标签名.*?>
Matching everything to the character >
.*?>
匹配所有字符>
On client-side, parse the text into a document with DOMParser and use querySelector
to select an element which is not code
, em
ul
, or br
with the query string:在客户端,使用 DOMParser 将文本解析为文档,并使用querySelector
选择一个不是code
、 em
ul
或br
带有查询字符串的元素:
*:not(code):not(em):not(ul):not(br)
If anything is returned, the string does not pass.如果返回任何内容,则字符串不会通过。
const test = (str) => { const doc = new DOMParser().parseFromString(str, 'text/html'); return !doc.body.querySelector('*:not(code):not(em):not(ul):not(br)'); }; console.log(test('foo <br> bar')); console.log(test('foo <code>code here</code> bar <br>')); console.log(test('foo <div>not allowed</div>'));
In Java, you can use Jsoup
to parse a given HTML string, and then you can select elements inside it, eg:在 Java 中,您可以使用Jsoup
来解析给定的 HTML 字符串,然后您可以选择其中的元素,例如:
Document doc = Jsoup.parse(input);
Elements forbiddenElements = doc.select("*:not(code):not(em):not(ul):not(br)");
If forbiddenElements
has anything in it, the string contains forbidden elements.如果forbiddenElements
有任何内容,则该字符串包含禁止元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.