I'm looking for a regular expression to isolate an html tag. This includes the TAG the ATTRIBUTES and the CONTNET inside.
Let's say I have this:
<html>
<body>
aajsdfkjaskd
<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>
</body>
</html>
I need a regular expression that would return:
<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>
Thank, Joe
Don't use a regex, use an HTML parser instead. Much more reliable and easier to work with.
If you're a PHP developer I recommend you use this one (http://simplehtmldom.sourceforge.net/).
查看HTML Agility Pack,它将使事情变得容易得多。
使用此正则表达式<TAGNAME.+?</TAGNAME>
If this is the main thing you're trying to do, XLST is a good tool to do it with. You can easily select just TAGNAME and copy over the attributes and text. See http://www.w3schools.com/xsl/ for an intro.
First of all: don't do this. Parsing HTML with regex is a maintenance nightmare and will most probably fail on any real world example of HTML. There are better options (like using a HTML parser like the HTML Agility pack ).
To answer your question though, the following regex will do what you want if the HTML code
It can be expanded to cover some of these cases, but you really don't want to =)
<TAGNAME(<TAGNAME (?<tagcounter>)|</TAGNAME>(?<-tagcounter>)|.)*</TAGNAME>(?(tagcounter)(?!))
You'd need RegexOptions.SingleLine
, too. See it in action at Ideone.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.