简体   繁体   中英

How can I strip all HTML codes except <sub> tag?

I need to remove all HTML tags except:

  • it is <sub> tag
  • there is {1 (or more) newline(s) + 4 (or more) spaces} in the behind of it
  • it is surrounded into "`" character.

Here is an examples:

var str = "something1
           <sub>
             something2
             <div class='myclass'>something3</div>
           </sub>
           <div class='myclass'>something4</div>
           something5

               <div class='myclass'>something6</div>
           <div class='myclass'>something7</div>
           `<div>something8</div>`
           something9";

Expected output:

/*   
something1
<sub>
  something2
  something3
</sub>
something4
something5

    <div class='myclass'>something6</div>
`<div>something8</div>`
something9

Here is what I've tried so far:

/\n\s{0,3}<.*[^>]+|<sub>.*?<\/sub>|`.*?`/gm

This is possible with regex substitutions. Use this regex with mg modifiers:

(\n\n    .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>

And use $1 as the substitution.

There are several parts to this. The capturing group finds all the HTML you may want to keep:

  • \\n\\n .* An empty line, and another line that starts with 4 spaces.
  • `[^`]+` Things in Back`Ticks .
  • <\\/?sub\\b[^>]+>) This matches sub HTML elements, opening or closing.

The remaining HTML elements will match <[^>]+> , which is discarded.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM