Regular Expression to isolate an html tag

Question

I'm looking for a regular expression to isolate an html tag. This includes the TAG the ATTRIBUTES and the CONTNET inside.

Let's say I have this:

<html> 
<body>
aajsdfkjaskd 
<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>
</body>
 </html>

I need a regular expression that would return:

<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>

Thank, Joe

Answer 1

Don't use a regex, use an HTML parser instead. Much more reliable and easier to work with.

If you're a PHP developer I recommend you use this one (http://simplehtmldom.sourceforge.net/).

Answer 2

查看HTML Agility Pack，它将使事情变得容易得多。

Answer 3

使用此正则表达式<TAGNAME.+?</TAGNAME>

Answer 4

If this is the main thing you're trying to do, XLST is a good tool to do it with. You can easily select just TAGNAME and copy over the attributes and text. See http://www.w3schools.com/xsl/ for an intro.

Answer 5

First of all: don't do this. Parsing HTML with regex is a maintenance nightmare and will most probably fail on any real world example of HTML. There are better options (like using a HTML parser like the HTML Agility pack ).

To answer your question though, the following regex will do what you want if the HTML code

is well formed (no missing closing tag, etc)
does not contain comments with "TAGNAME" in them
does not contain script blocks with "TAGNAME" in them
maybe more

It can be expanded to cover some of these cases, but you really don't want to =)

    <TAGNAME(<TAGNAME (?<tagcounter>)|</TAGNAME>(?<-tagcounter>)|.)*</TAGNAME>(?(tagcounter)(?!))

You'd need RegexOptions.SingleLine , too. See it in action at Ideone.com

Regular Expression to isolate an html tag

Question

5 answers

solution1
2 2012-07-11 15:15:21

solution2
1 2012-07-11 15:16:32

solution3
0 2012-07-11 15:13:14

solution4
0 2012-07-11 15:16:04

solution5
0 2012-07-11 15:41:48

Regular Expression to isolate an html tag

Question

5 answers

solution1 2 2012-07-11 15:15:21

solution2 1 2012-07-11 15:16:32

solution3 0 2012-07-11 15:13:14

solution4 0 2012-07-11 15:16:04

solution5 0 2012-07-11 15:41:48

solution1
2 2012-07-11 15:15:21

solution2
1 2012-07-11 15:16:32

solution3
0 2012-07-11 15:13:14

solution4
0 2012-07-11 15:16:04

solution5
0 2012-07-11 15:41:48