How can I strip all HTML codes except <sub> tag?

Question

I need to remove all HTML tags except:

it is <sub> tag
there is {1 (or more) newline(s) + 4 (or more) spaces} in the behind of it
it is surrounded into "`" character.

Here is an examples:

var str = "something1
           <sub>
             something2
             <div class='myclass'>something3</div>
           </sub>
           <div class='myclass'>something4</div>
           something5

               <div class='myclass'>something6</div>
           <div class='myclass'>something7</div>
           `<div>something8</div>`
           something9";

Expected output:

/*   
something1
<sub>
  something2
  something3
</sub>
something4
something5

    <div class='myclass'>something6</div>
`<div>something8</div>`
something9

Here is what I've tried so far:

/\n\s{0,3}<.*[^>]+|<sub>.*?<\/sub>|`.*?`/gm

Answer 1

This is possible with regex substitutions. Use this regex with mg modifiers:

(\n\n    .*|`[^`]+`|<\/?sub\b[^>]+>)|<[^>]+>

And use $1 as the substitution.

There are several parts to this. The capturing group finds all the HTML you may want to keep:

\\n\\n .* An empty line, and another line that starts with 4 spaces.
`[^`]+` Things in Back`Ticks .
<\\/?sub\\b[^>]+>) This matches sub HTML elements, opening or closing.

The remaining HTML elements will match <[^>]+> , which is discarded.

How can I strip all HTML codes except <sub> tag?

Question

Here is an examples:

1 answers

solution1
0 2016-09-07 15:35:14

How can I strip all HTML codes except <sub> tag?

Question

Here is an examples:

1 answers

solution1 0 2016-09-07 15:35:14

solution1
0 2016-09-07 15:35:14