i want a regex to find out specific html tag details.
i tried bellow 2 regex :
<\s*tag[^>]*>(.*?)<\s*/\s*tag>
<tag[^<>]*>.+?<\/tag>
bellow are the 2 test cases for 1st regex :
in 1st example i am getting correct result but in the example 2 i am getting wrong result. where in both the cases in-puts are almost same.
1st case : all are as individual string and 2nd case : as a single string .
===================================
Example 1 Input
===================================
<tagX>AAA</tagX>
<tag>GGG</tag>
<tag id="tag896">HHH</tag>
<tagY>III</tagY>
<tag id="tag017">JJJ</tag>
<tag>KKK</tag>
===================================
Output 1 // Correct
===================================
<tag>GGG</tag>
GGG
<tag id="tag896">HHH</tag>
HHH
<tag id="tag017">JJJ</tag>
JJJ
<tag>KKK</tag>
KKK
===================================
Example 2 Input (as a single string)
===================================
<tagX>AAA</tagX><tag>GGG</tag><tag id="tag896">HHH</tag><tagY>III</tagY><tag id="tag017">JJJ</tag><tag>KKK</tag>
===================================
Output 2 // Wrong
===================================
<tagX>AAA</tagX><tag>GGG</tag>
AAA</tagX><tag>GGG
<tag id="tag896">HHH</tag>
HHH
<tagY>III</tagY><tag id="tag017">JJJ</tag>
III</tagY><tag id="tag017">JJJ
<tag>KKK</tag>
KKK
here exactly i want the details of (tag) but in 2nd case its fetching (tag) + (tagX) + (tagY) details.
my input is similar to 2nd input...
its lil urgent ... can i get a solution for this.
thanks...
Your problem in the Regular Expressions you've written, is that you allow <tagX>
(for example) to be the opening tag if there's `' that's supposedly closes it on the same line.
Your problem with using Regular Expressions in this case, is that you might get a bad result if the XML is:
<tag></tag>
<tagX></tagX>
<tag></tag>
If all the tags are inline, you could get the whole thing, so be very careful.
I'd work with something like (this works with the above example):
<\s*tag((\s+[^<>]+\s*>)|(\s*>))[^<>]*<\s*\/tag\s*>
Here, I allow all the whitespaces which are valid, but I don't allow nested tags, so the above example will work. Moreover, If you allow nested tags, no REGEX will work. Look at this example:
<tag> <tagX> <tag> </tag> </tagX> </tag>
Though, in this example, you will get <tag> <tagX> <tag> </tag>
as a valid answer.
我试过下面的正则表达式并且工作正常......
<tag( [^<>]+)?>(.+?)<\/tag>
如果您使用的是.NET(由于某种原因,您确定您的XML并且不需要使用Html Agility Pack ),您可以尝试这样做:
<tag(?:>|(?: .*?>))(.*?)</tag>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.