简体   繁体   中英

Regular Expression to isolate an html tag

I'm looking for a regular expression to isolate an html tag. This includes the TAG the ATTRIBUTES and the CONTNET inside.

Let's say I have this:

<html> 
<body>
aajsdfkjaskd 
<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>
</body>
 </html>

I need a regular expression that would return:

<TAGNAME name="bla" context="non">hfdfhdj </TAGNAME>

Thank, Joe

Don't use a regex, use an HTML parser instead. Much more reliable and easier to work with.

If you're a PHP developer I recommend you use this one (http://simplehtmldom.sourceforge.net/).

查看HTML Agility Pack,它将使事情变得容易得多。

使用此正则表达式<TAGNAME.+?</TAGNAME>

If this is the main thing you're trying to do, XLST is a good tool to do it with. You can easily select just TAGNAME and copy over the attributes and text. See http://www.w3schools.com/xsl/ for an intro.

First of all: don't do this. Parsing HTML with regex is a maintenance nightmare and will most probably fail on any real world example of HTML. There are better options (like using a HTML parser like the HTML Agility pack ).

To answer your question though, the following regex will do what you want if the HTML code

  • is well formed (no missing closing tag, etc)
  • does not contain comments with "TAGNAME" in them
  • does not contain script blocks with "TAGNAME" in them
  • maybe more

It can be expanded to cover some of these cases, but you really don't want to =)

    <TAGNAME(<TAGNAME (?<tagcounter>)|</TAGNAME>(?<-tagcounter>)|.)*</TAGNAME>(?(tagcounter)(?!))

You'd need RegexOptions.SingleLine , too. See it in action at Ideone.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM