简体   繁体   中英

javascript regex to find specific html tag details

i want a regex to find out specific html tag details.

i tried bellow 2 regex :

<\s*tag[^>]*>(.*?)<\s*/\s*tag>

<tag[^<>]*>.+?<\/tag>

bellow are the 2 test cases for 1st regex :

in 1st example i am getting correct result but in the example 2 i am getting wrong result. where in both the cases in-puts are almost same.

1st case : all are as individual string and 2nd case : as a single string .

===================================
Example 1 Input
===================================
<tagX>AAA</tagX>
<tag>GGG</tag>
<tag id="tag896">HHH</tag>
<tagY>III</tagY>
<tag id="tag017">JJJ</tag>
<tag>KKK</tag>
===================================
Output 1 // Correct
===================================
<tag>GGG</tag>
GGG
<tag id="tag896">HHH</tag>
HHH
<tag id="tag017">JJJ</tag>
JJJ
<tag>KKK</tag>
KKK


===================================
Example 2 Input (as a single string)
===================================
<tagX>AAA</tagX><tag>GGG</tag><tag id="tag896">HHH</tag><tagY>III</tagY><tag id="tag017">JJJ</tag><tag>KKK</tag>
===================================
Output 2 // Wrong
===================================
<tagX>AAA</tagX><tag>GGG</tag>
AAA</tagX><tag>GGG

<tag id="tag896">HHH</tag>
HHH

<tagY>III</tagY><tag id="tag017">JJJ</tag>
III</tagY><tag id="tag017">JJJ

<tag>KKK</tag>
KKK

here exactly i want the details of (tag) but in 2nd case its fetching (tag) + (tagX) + (tagY) details.

my input is similar to 2nd input...

its lil urgent ... can i get a solution for this.

thanks...

Your problem in the Regular Expressions you've written, is that you allow <tagX> (for example) to be the opening tag if there's `' that's supposedly closes it on the same line.

Your problem with using Regular Expressions in this case, is that you might get a bad result if the XML is:

<tag></tag>
<tagX></tagX>
<tag></tag>

If all the tags are inline, you could get the whole thing, so be very careful.

I'd work with something like (this works with the above example):

 <\s*tag((\s+[^<>]+\s*>)|(\s*>))[^<>]*<\s*\/tag\s*>

Here, I allow all the whitespaces which are valid, but I don't allow nested tags, so the above example will work. Moreover, If you allow nested tags, no REGEX will work. Look at this example:

<tag> <tagX> <tag> </tag> </tagX> </tag>

Though, in this example, you will get <tag> <tagX> <tag> </tag> as a valid answer.

我试过下面的正则表达式并且工作正常......

<tag( [^<>]+)?>(.+?)<\/tag>

如果您使用的是.NET(由于某种原因,您确定您的XML并且不需要使用Html Agility Pack ),您可以尝试这样做:

<tag(?:>|(?: .*?>))(.*?)</tag>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM