[英]regex help with replacing <html> tags
I need to extend on the regex below so that it also selects <code> tags with a class, eg <code class="lol"> 我需要扩展下面的正则表达式,以便它也选择带有类的<code>标签,例如<code class =“lol”>
var text = 'This is <i>encoded text</i> but this is <b>bold</b >!';
var html = $('<div/>')
.text(text)
.html()
.replace(new RegExp('<(/)?(b|i|u)\\s*>', 'gi'), '<$1$2>');
Can anyone please help? 有人可以帮忙吗?
I'm guessing something like <(/)?(b|i|u|code|pre)?( class="")\\\\s*>
我猜是像<(/)?(b|i|u|code|pre)?( class="")\\\\s*>
?? ??
Many thanks 非常感谢
Parsing html with a regex is a bad idea, see this answer . 使用正则表达式解析html是一个坏主意,请参阅此答案 。
The easiest way would to simply use some of jQuery's dom manipulation functions to remove the formating. 最简单的方法是简单地使用一些jQuery的dom操作函数来删除格式化。
$('<div/>').find("b, i, code, code.lol").each(function() {
$(this).replaceWith($(this).text());
});
This will replace the whole tag with everything in it (including class, id, etc.): 这会将整个标记替换为其中的所有内容(包括class,id等):
.replace(new RegExp('<(/)?(b|u|i|code|pre)(.*?)>', 'gim'), '<$1$2$3>');
Mathing a code tag with a class in encoded string is hard (maybe impossible), it's easy when the code tag is in a fixed format ( <code class="whatever">
): 使用编码字符串中的类来编写代码标记很难(可能是不可能的),当代码标记采用固定格式( <code class="whatever">
)时很容易:
.replace(new RegExp('<(?:(code\\sclass=".*?")|(/)?(b|u|i|code|pre)(?:.*?))>', 'gim'), '<$1$2$3>');
I wouldn't use a regex for parsing markup, but if its just a string snippet, something like this would be sufficient. 我不会使用正则表达式来解析标记,但如果它只是一个字符串片段,这样的东西就足够了。 It should be noted that the regex your using is overburdened using the \\s*. 应该注意的是,你使用的正则表达式使用\\ s *负担过重。 Its optional form could go through the overhead and replace the exact same thing. 它的可选形式可以通过开销来替换完全相同的东西。 Better to use \\s+ 最好使用\\ s +
regex: <(/?(?:b|i|u)|code\\s[^>]+class\\s*=\\s*(['"]).*?\\2[^>]*?)\\s+>
正则表达式: <(/?(?:b|i|u)|code\\s[^>]+class\\s*=\\s*(['"]).*?\\2[^>]*?)\\s+>
replace: <$1>
替换: <$1>
modifiers: sgi
修饰符: sgi
< # < Opening markup char
( # Capture group 1
/? # optional element termination
(?: # grouping, non-capture
b|i|u # elements 'b', 'i', or 'u'
) # end grouping
| # OR,
code # element 'code' only
\s [^>]* # followed by a space and possibly any chars except '>'
class \s* = \s* # 'class' attribute '=' something
(['"]) .*? \2 # value delimeter, then some possible chars, then delimeter
[^>]*? # followed by possibly any chars not '>'
) # End capture group 1
\s+ # Here need 1 or more whitespace, what is being removed
> # > Closing markup char
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.