简体   繁体   English

使用正则表达式从html标记中剥离所有属性

[英]stripping all attributes from an html tag using regex

I've been trying to formulate a regular expression to remove any attributes that may be present in html tags but I'm having trouble doing this and Google doesn't seem to provide any answers either. 我一直在尝试公式化正则表达式以删除html标记中可能存在的任何属性,但是我在执行此操作时遇到了麻烦,Google似乎也未提供任何答案。

Basically my input string looks something like 基本上我的输入字符串看起来像

<p style="font-family:Arial;" class="x" onclick="doWhatever();">this text</p>
<img style="border:0px" src="pic.gif" />

and I would like to remove any attributes inside the tag to produce a string like: 并且我想删除标签内的所有属性以生成类似以下的字符串:

<p>this text</p>
<img src="pic.gif" />

Does anybody know a regex for doing this? 有人知道这样做的正则表达式吗? I'm using Regex.Replace in C# by the way. 顺便说一下,我在C#中使用Regex.Replace。

There are really excellent tools for handling this sort of task in .NET without having to resort to the regex hammer. 确实有出色的工具可以在.NET中处理此类任务,而不必求助于正则表达式。 This will also be more reliable than a regular expression based solution. 这也将比基于正则表达式的解决方案更可靠。

I'd suggest that you take a look at HTML Agility Pack . 我建议您看一下HTML Agility Pack

HTML is easiest interfaced with using a DOM, but if you really want to do this using a regex you could probably take advantage of that you want to remove all attributes, eg leave nothing left but the tag. HTML是最简单的使用DOM进行接口的方法,但是如果您确实想使用正则表达式来执行此操作,则可以利用要删除所有属性的优势,例如,除了标记之外,什么也不要留下。 IMO you should use a DOM parser instead. IMO,您应该改用DOM解析器。

either that or using jquery each to go trough all html elements and remove attr. 要么使用jquery要么遍历所有html元素并删除attr。 or from particular element. 或来自特定元素。 Why would you be doing that anyway? 你为什么要这么做呢?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM