简体   繁体   中英

regex; match tag with specific attributes

I am aware regex is not recommended for parsing html. I have this tag, named tag and tag has many possible attributes and it has these 3 required attributes, their names attr , bttr , cttr are known . These attributes are assigned certain values that are not known. I need a regex, that matches these examples:

<tag attr="0" bttr="0" cttr="0" />
<tag attr="0" cttr="0" bttr="0" />
<tag bttr="0" attr="0" cttr="0" />
<tag bttr="0" cttr="0" attr="0" />
<tag cttr="0" attr="0" bttr="0" />
<tag cttr="0" bttr="0" attr="0" />

and there could possibly be other attributes, but not necessarily, for example:

<tag attr="0" cttr="0" bttr="0" dar-vienas="0" />
<tag bttr="0" cttr="0" dar-vienas="0" attr="0" />
<tag attr="0" dar-vienas="0" cttr="0" irdar-vienas="0" bttr="0" />

all these have to match. And this one must not match

<tag attr="0" dar-vienas="0" bttr="0" irdar-vienas="0" />

it is missing cttr attribute, cannot match. Alright, what's the regex? So far all my attempts have failed...

We can try using lookaheads which assert that each of the attr , bttr , and cttr attributes occurs inside the <tag> :

<tag (?=((?!\/>).)*\battr="0")(?=((?!\/>).)*\bbttr="0")(?=((?!\/>).)*\bcttr="0").*?\/>

Demo

For an explanation, there are three lookaheads in the above pattern, one for each attribute. Here is how the first one works:

(?=              lookahead
    ((?!\/>).)*  and consume any input, without passing the end of the <tag>,
    \battr="0"   then assert that we can find attribute 'attr' inside the tag
)

Note that you spoke correctly when you said that regex should not be used for parsing HTML; it shouldn't be. Instead, the best solution here would probably be to use some sort of DOM parser.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM