简体   繁体   中英

Javascript remove \n or \t in html content except within pre tags

In Javascript, how to remove line break (\\n or \\t) in html string except within <pre> tags.

I use this code to remove line break:

htmlString.replace(/[\n\t]+/g,"");

However, it also removes \\n\\t in <pre> tag. How to fix it?

You can use TreeWalker in order to select all text nodes and apply your regex only to these nodes:

 // // closest Polyfill from https://developer.mozilla.org/en-US/docs/Web/API/Element/closest // if (window.Element && !Element.prototype.closest) { Element.prototype.closest = function (s) { var matches = (this.document || this.ownerDocument).querySelectorAll(s), i, el = this; do { i = matches.length; while (--i >= 0 && matches.item(i) !== el) { }; } while ((i < 0) && (el = el.parentElement)); return el; }; } document.getElementById("remove").addEventListener('click', function(e) { // // traverse the DOM // var walker = document.createTreeWalker( document.body, NodeFilter.SHOW_TEXT, null, false ); var node; while (node = walker.nextNode()) { if (node.parentElement.closest('PRE') != null) { node.textContent = node.textContent.replace(/[\\n\\t]+/g, ""); } } });
 pre { background: #fffbec; }
 <button id="remove">Remove</button><br> <pre> this is a pre tag with tab </pre> <pre class="language-cpp"> <code> void main() { printf("Hello"); } </code> </pre> <p> first word new end</p>

You can start first by matching the text that need to be cleaned, which can only be:

  • Text from the begining of the string to the next opening <pre> tag.
  • Text from a closing </pre> tag to the next opening <pre> tag.
  • Text from a closing </pre> tag to the end of the string.
  • Text from the begining of the string to the end of the string (no pre elements in the string).

which can be described in regex as:

(?:^|<\/pre>)[^]*?(?:<pre>|$)/g

where [^] matches anything including new lines, and *? is a non-greedy quantifier to match as few times as possible.


Next, we get the matched text that need to be cleaned, so we clean it using the regex /[\\n\\t]+/g .


Example:

 var htmlString = "<body>\\n\\t<p>\\n\\t\\tLorem\\tEpsum\\n\\t</p>\\n\\t<pre>\\n\\t\\tHello, World!\\n\\t</pre>\\n\\n\\t<pre>\\n\\t\\tThis\\n\\t\\tis\\n\\t\\tawesome\\n\\t</pre>\\n\\n\\n</body>"; var preview = document.getElementById("preview"); preview.textContent = htmlString; document.getElementById("remove").onclick = function() { preview.textContent = htmlString.replace(/(?:^|<\\/pre>)[^]*?(?:<pre>|$)/g, function(m) { return m.replace(/[\\n\\t]+/g, ""); }); }
 pre { background: #fffbec; }
 <button id="remove">Remove</button> The pre bellow is just used to show the string, it is not THE PRE. <pre id="preview"></pre>


Regex101 Example .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM