I want to remove html tag's attributes using regex. It could be any html element and allow nested elements like:
<div fadeout"="" style="margin:0px;" class="xyz">
<img src="abc.jpg" alt="" />
<p style="margin-bottom:10px;">
The event is celebrating its 50th anniversary Kö
<a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>.
</p>
<p style="padding:0px;"></p>
<p style="color:black;">
<strong>A festival for art lovers</strong>
</p>
</div>
or it could be like
<span style="margin: 0;"><p class="abc"> Test text</p></span>
because of security reason, need to remove attributes
What I have tried to remove
s/(<\w+)\s+[^>]*/$1/
<*\b[^<]*>(?:[^<]+(?:<(?!\/?div\b)[^<]*)*|(?R))*<\/*>\s*
<([a-z][a-z0-9]*)[^>]*?(\/?)>
but not working
Regex should not be used to parse HTML.
Instead, you should use a DOMParser
to parse the string, loop through each element's attributes and use Element.removeAttribute
:
const str = `<div fadeout"="" style="margin:0px;" class="xyz"> <img src="abc.jpg" alt="" /> <p style="margin-bottom:10px;"> The event is celebrating its 50th anniversary Kö <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>. </p> <p style="padding:0px;"></p> <p style="color:black;"> <strong>A festival for art lovers</strong> </p> </div>` function stripAttributes(html){ const parsed = new DOMParser().parseFromString(html, 'text/html') parsed.body.querySelectorAll('*').forEach(elem => [...elem.attributes].forEach(attr => elem.removeAttribute(attr.name))) return parsed.body.innerHTML; } console.log(stripAttributes(str))
我建议你不要在这种情况下使用正则表达式,但如果你别无选择,也许你正在寻找这样的东西:
/<\s*([a-z][a-z0-9]*)\s.*?>/gi
The nice thing about working with the DOM is that you have a whole set of tools available to you that were designed specifically for manipulating a DOM! And yet people insist on treating this complex structured data format as though it's just a dumb string and start hacking away at it with regex.
Use the right tool for the job.
function removeAttributesRecursively(el) { Array.from(el.attributes).forEach(function(attr) { // you'll probably want to include extra logic here to // preserve some attributes (a href, img src, etc) // instead of blindly removing all of them el.removeAttribute(attr.name); }); // recurse: Array.from(el.children).forEach(function(child) { removeAttributesRecursively(child) }) } const root = document.getElementById('input'); removeAttributesRecursively(root) console.log(root.innerHTML)
<div id="input"> <div fadeout="" style="margin:0px;" class="xyz"> <img src="abc.jpg" alt="" /> <p style="margin-bottom:10px;"> The event is celebrating its 50th anniversary Kö <a style="margin:0px;" href="http://www.germany.travel/">exhibition grounds in Cologne</a>. </p> <p style="padding:0px;"></p> <p style="color:black;"> <strong>A festival for art lovers</strong> </p> </div> </div>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.