简体   繁体   中英

Strip Html from Text in JavaScript except p tags?

我需要用JavaScript更改RichEditor和TextEditor模式,现在我需要将Html转换为实际仍处于Html编辑器模式的Text,所以我只需要p标签,但其他Html可以被剥离。

Regex replace (globally, case-insensitively):

</?(?:(?!p\b)[^>])*>

with the empty string.

Explanation:

<          # "<"
/?         # optional "/" 
(?:        # non-capture group
  (?!      #   negative look-ahead: a position not followed by...
    p\b    #     "p" and a word bounday
  )        #   end lock-ahead
  [^>]*    #   any char but ">", as often as possible
)          # end non-capture group
>          # ">"

This is one of the few situations where applying regex to HTML can actually work.

Some might object and say that the use of a literal "<" within an attribute value was actually not forbidden, and therefore would potentially break the above regex. They would be right.

The regex would break in this situation, replacing the underlined part:

<p class="foo" title="unusual < title">
                              ---------

If such a thing is possible with your input, then you might have to use a more advanced tool to do the job - a parser.

This should help

var html = '<img src=""><p>content</p><span style="color: red">content</span>';
html.replace(/<(?!\s*\/?\s*p\b)[^>]*>/gi,'')

explanation for my regex:

replace all parts

  1. beginning with "<",
  2. not followed by (?!
    • any number of white-space characters "\\s*"
    • optional "/" character
    • and tag name followed by a word boundary (here "p\\b")
  3. containing any characters not equal ">" - [^>]*
  4. and ending with ">" character

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM