简体   繁体   中英

Regex to remove all attributes from nested html tags - Javascript

I want to remove html tag's attributes using regex. It could be any html element and allow nested elements like:

<div fadeout"="" style="margin:0px;" class="xyz">
    <img src="abc.jpg" alt="" />
    <p style="margin-bottom:10px;">
    The event is celebrating its 50th anniversary K&ouml;&nbsp;
    <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
    </p>
    <p style="padding:0px;"></p>
    <p style="color:black;">
       <strong>A festival for art lovers</strong>
    </p>
</div>

or it could be like

<span style="margin: 0;"><p class="abc"> Test text</p></span>

because of security reason, need to remove attributes

What I have tried to remove

s/(<\w+)\s+[^>]*/$1/

<*\b[^<]*>(?:[^<]+(?:<(?!\/?div\b)[^<]*)*|(?R))*<\/*>\s*
<([a-z][a-z0-9]*)[^>]*?(\/?)>

but not working

Regex should not be used to parse HTML.

Instead, you should use a DOMParser to parse the string, loop through each element's attributes and use Element.removeAttribute :

 const str = `<div fadeout"="" style="margin:0px;" class="xyz"> <img src="abc.jpg" alt="" /> <p style="margin-bottom:10px;"> The event is celebrating its 50th anniversary K&ouml;&nbsp; <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>. </p> <p style="padding:0px;"></p> <p style="color:black;"> <strong>A festival for art lovers</strong> </p> </div>` function stripAttributes(html){ const parsed = new DOMParser().parseFromString(html, 'text/html') parsed.body.querySelectorAll('*').forEach(elem => [...elem.attributes].forEach(attr => elem.removeAttribute(attr.name))) return parsed.body.innerHTML; } console.log(stripAttributes(str))

我建议你不要在这种情况下使用正则表达式,但如果你别无选择,也许你正在寻找这样的东西:

/<\s*([a-z][a-z0-9]*)\s.*?>/gi 

The nice thing about working with the DOM is that you have a whole set of tools available to you that were designed specifically for manipulating a DOM! And yet people insist on treating this complex structured data format as though it's just a dumb string and start hacking away at it with regex.

Use the right tool for the job.

 function removeAttributesRecursively(el) { Array.from(el.attributes).forEach(function(attr) { // you'll probably want to include extra logic here to // preserve some attributes (a href, img src, etc) // instead of blindly removing all of them el.removeAttribute(attr.name); }); // recurse: Array.from(el.children).forEach(function(child) { removeAttributesRecursively(child) }) } const root = document.getElementById('input'); removeAttributesRecursively(root) console.log(root.innerHTML)
 <div id="input"> <div fadeout="" style="margin:0px;" class="xyz"> <img src="abc.jpg" alt="" /> <p style="margin-bottom:10px;"> The event is celebrating its 50th anniversary K&ouml;&nbsp; <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>. </p> <p style="padding:0px;"></p> <p style="color:black;"> <strong>A festival for art lovers</strong> </p> </div> </div>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM