I need to take a string that can have one of 4 formats:
html
text
attachment
email:[address]
I need a regular expression that will correctly capture 2 things: the $type
, which is html
, text
, attachment
, or email
, and the $arg
, which is [address]
if $type
is email
, and undef
otherwise. If $type
is not email
, then there should be no matches at all. I've written this regex:
m/(html|email|text|attachment):?(.*)/;
Which has the problem that it will match even if there is something trailing text
, html
, or attachment
, and will also match if there is no :
. So, for instance, emailme@foo.com
would give ("email", "me@foo.com")
. I also tried this one:
m/(html)|(email):(.*)|(text)|(attachment)/;
Which results in 5 groups. Is there a way to capture the way I want, so that I will get no matches if there is no colon after email
, or if there IS a colon after something else?
Yes, to do that you can use the branch reset feature: (?|...|...|...)
/(?|(html)|(email):(.*)|(text)|(attachment))/
In a branch reset, capture groups of each alternative have the same numbers.
To exclude, "html", "text", "attachment" followed by anything else (including a colon), you need a condition on the right (anchor, lookahead or other). Same thing for the beginning.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.