简体   繁体   中英

Finding optional groups with random order using regex

I'm trying to get the following using Regex.

This is sample input:

-emto=USER@HOST.COM -emfrom=USER@HOST.COM -emsubject="MYSUBJECT" 

Other input:

-emto=USER@HOST.COM -emfrom=USER@HOST.COM -emcc=ME@HOST.COM -embcc=YOU@HOST.COM -emsubject="MYSUBJECT" 

What I would like to achieve is get named groups using the text after -em . So I'd like to have for example group EMAIL_TO, EMAIL_FROM, EMAIL_CC, ... Note that I could concat groupname and capture using code, no problem.

Problem is that I don't know how to capture optional groups with "random" positions. For example, CC and BCC do not always appear but sometimes they do and then I need to capture them.

Can anybody help me out on this one?!

What I have so far: (?:-em(?<EMAIL_>to|cc|bcc|from|subject)=(.*))

Just do something like:

-em([^\s=]+)=([^\s]+)

If you need to support quoting of values, so that they can contain spaces:

-em([^\s=]+)=("[^"]*"|[^\s]+)

And iterate over all the matches in the command line arg string. For each match, look at the "key" (first capturing group) and see if it is one you recognize. If not, display an error message and exit. If it is, set the option accordingly (the second capturing group is the "value").

POSTSCRIPT: This reminds me of a situation which often comes up when writing a grammar for a computer language.

It is possible (perhaps even natural) to write a grammar which only works for syntactically perfect programs. But for good error reporting, it is much better to write a grammar which accepts a superset of syntactically correct programs. After you get the parse tree, you can run over it, look for errors, and report them using application-specific code.

In this case, you could write a regex which will only match the options which you actually accept. But then if someone mistypes an option, the regex will simply fail to match. Your program will not be able to provide any specific error messages, regardless of whether the command line args are -emsubjcet=something or if they are something completely off the wall like @@#$*(#&U*REJDFFKDSJ**&#(*$&## .

POST-POSTSCRIPT: Note the very common regex pattern of matching "delimiter + any number of characters which are not a delimiter". In my above regexes, you can see this here: ([^\\s=]+)= -- 1 or more chars which are not whitespace OR =, followed by =. This allows us to easily eat everything which is part of the key, but not go too far and match the delimiting = . You can see it again here: "[^"]*" -- a quote mark, followed by 0 or more chars which are not a quote mark, followed by a closing quote mark.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM