简体   繁体   中英

stripping all attributes from an html tag using regex

I've been trying to formulate a regular expression to remove any attributes that may be present in html tags but I'm having trouble doing this and Google doesn't seem to provide any answers either.

Basically my input string looks something like

<p style="font-family:Arial;" class="x" onclick="doWhatever();">this text</p>
<img style="border:0px" src="pic.gif" />

and I would like to remove any attributes inside the tag to produce a string like:

<p>this text</p>
<img src="pic.gif" />

Does anybody know a regex for doing this? I'm using Regex.Replace in C# by the way.

There are really excellent tools for handling this sort of task in .NET without having to resort to the regex hammer. This will also be more reliable than a regular expression based solution.

I'd suggest that you take a look at HTML Agility Pack .

HTML is easiest interfaced with using a DOM, but if you really want to do this using a regex you could probably take advantage of that you want to remove all attributes, eg leave nothing left but the tag. IMO you should use a DOM parser instead.

either that or using jquery each to go trough all html elements and remove attr. or from particular element. Why would you be doing that anyway?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM