I am aware regex is not recommended for parsing html. I have this tag, named tag and tag has many possible attributes and it has these 3 required attributes, their names attr , bttr , cttr are known . These attributes are assigned certain values that are not known. I need a regex, that matches these examples:
<tag attr="0" bttr="0" cttr="0" />
<tag attr="0" cttr="0" bttr="0" />
<tag bttr="0" attr="0" cttr="0" />
<tag bttr="0" cttr="0" attr="0" />
<tag cttr="0" attr="0" bttr="0" />
<tag cttr="0" bttr="0" attr="0" />
and there could possibly be other attributes, but not necessarily, for example:
<tag attr="0" cttr="0" bttr="0" dar-vienas="0" />
<tag bttr="0" cttr="0" dar-vienas="0" attr="0" />
<tag attr="0" dar-vienas="0" cttr="0" irdar-vienas="0" bttr="0" />
all these have to match. And this one must not match
<tag attr="0" dar-vienas="0" bttr="0" irdar-vienas="0" />
it is missing cttr attribute, cannot match. Alright, what's the regex? So far all my attempts have failed...
We can try using lookaheads which assert that each of the attr
, bttr
, and cttr
attributes occurs inside the <tag>
:
<tag (?=((?!\/>).)*\battr="0")(?=((?!\/>).)*\bbttr="0")(?=((?!\/>).)*\bcttr="0").*?\/>
For an explanation, there are three lookaheads in the above pattern, one for each attribute. Here is how the first one works:
(?= lookahead
((?!\/>).)* and consume any input, without passing the end of the <tag>,
\battr="0" then assert that we can find attribute 'attr' inside the tag
)
Note that you spoke correctly when you said that regex should not be used for parsing HTML; it shouldn't be. Instead, the best solution here would probably be to use some sort of DOM parser.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.