简体   繁体   English

正则表达式有助于替换 <html> 标签

[英]regex help with replacing <html> tags

I need to extend on the regex below so that it also selects <code> tags with a class, eg <code class="lol"> 我需要扩展下面的正则表达式,以便它也选择带有类的<code>标签,例如<code class =“lol”>

var text = 'This is <i>encoded text</i> but this is <b>bold</b >!';
var html = $('<div/>')
    .text(text)
    .html()
    .replace(new RegExp('&lt;(/)?(b|i|u)\\s*&gt;', 'gi'), '<$1$2>');

Can anyone please help? 有人可以帮忙吗?

I'm guessing something like &lt;(/)?(b|i|u|code|pre)?( class="")\\\\s*&gt; 我猜是像&lt;(/)?(b|i|u|code|pre)?( class="")\\\\s*&gt; ?? ??

Many thanks 非常感谢

Parsing html with a regex is a bad idea, see this answer . 使用正则表达式解析html是一个坏主意,请参阅此答案

The easiest way would to simply use some of jQuery's dom manipulation functions to remove the formating. 最简单的方法是简单地使用一些jQuery的dom操作函数来删除格式化。

$('<div/>').find("b, i, code, code.lol").each(function() {
    $(this).replaceWith($(this).text());
});

Code example on jsfiddle . 关于jsfiddle的代码示例。

This will replace the whole tag with everything in it (including class, id, etc.): 这会将整个标记替换为其中的所有内容(包括class,id等):

.replace(new RegExp('&lt;(/)?(b|u|i|code|pre)(.*?)&gt;', 'gim'), '<$1$2$3>');

Mathing a code tag with a class in encoded string is hard (maybe impossible), it's easy when the code tag is in a fixed format ( <code class="whatever"> ): 使用编码字符串中的类来编写代码标记很难(可能是不可能的),当代码标记采用固定格式( <code class="whatever"> )时很容易:

.replace(new RegExp('&lt;(?:(code\\sclass=".*?")|(/)?(b|u|i|code|pre)(?:.*?))&gt;', 'gim'), '<$1$2$3>');

I wouldn't use a regex for parsing markup, but if its just a string snippet, something like this would be sufficient. 我不会使用正则表达式来解析标记,但如果它只是一个字符串片段,这样的东西就足够了。 It should be noted that the regex your using is overburdened using the \\s*. 应该注意的是,你使用的正则表达式使用\\ s *负担过重。 Its optional form could go through the overhead and replace the exact same thing. 它的可选形式可以通过开销来替换完全相同的东西。 Better to use \\s+ 最好使用\\ s +

regex: <(/?(?:b|i|u)|code\\s[^>]+class\\s*=\\s*(['"]).*?\\2[^>]*?)\\s+> 正则表达式: <(/?(?:b|i|u)|code\\s[^>]+class\\s*=\\s*(['"]).*?\\2[^>]*?)\\s+>
replace: <$1> 替换: <$1>
modifiers: sgi 修饰符: sgi

<                       # < Opening markup char
   (                       # Capture group 1
       /?                        # optional element termination
       (?:                       # grouping, non-capture
          b|i|u                    # elements 'b', 'i', or 'u'
       )                         # end grouping
    |                         # OR,
       code                      # element 'code' only
       \s [^>]*                  # followed by a space and possibly any chars except '>'
       class \s* = \s*           # 'class' attribute '=' something
         (['"]) .*? \2           # value delimeter, then some possible chars, then delimeter
       [^>]*?                    # followed by possibly any chars not '>'
   )                       # End capture group 1
   \s+                     # Here need 1 or more whitespace, what is being removed
>                      # > Closing markup char

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM