简体   繁体   中英

perl: regex for matching after an optional character

I need to take a string that can have one of 4 formats:

  1. html
  2. text
  3. attachment
  4. email:[address]

I need a regular expression that will correctly capture 2 things: the $type , which is html , text , attachment , or email , and the $arg , which is [address] if $type is email , and undef otherwise. If $type is not email , then there should be no matches at all. I've written this regex:

m/(html|email|text|attachment):?(.*)/;

Which has the problem that it will match even if there is something trailing text , html , or attachment , and will also match if there is no : . So, for instance, emailme@foo.com would give ("email", "me@foo.com") . I also tried this one:

m/(html)|(email):(.*)|(text)|(attachment)/;

Which results in 5 groups. Is there a way to capture the way I want, so that I will get no matches if there is no colon after email , or if there IS a colon after something else?

Yes, to do that you can use the branch reset feature: (?|...|...|...)

/(?|(html)|(email):(.*)|(text)|(attachment))/

In a branch reset, capture groups of each alternative have the same numbers.

To exclude, "html", "text", "attachment" followed by anything else (including a colon), you need a condition on the right (anchor, lookahead or other). Same thing for the beginning.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM