简体   繁体   中英

How can I group multiple e-mail addresses and user names using a regular expression

I have the following text that I am trying to parse:

"user1@emailaddy1.com" <user1@emailaddy1.com>, "Jane Doe" <jane.doe@ addyB.org>,
"joe@company.net" <joe@company.net>

I am using the following code to try and split up the string:

Dim groups As GroupCollection
Dim matches As MatchCollection
Dim regexp1 As New Regex("""(.*)"" <(.*)>")
matches = regexp1 .Matches(toNode.InnerText)
For Each match As Match In matches
    groups = match.Groups
    message.CompanyName = groups(1).Value
    message.CompanyEmail = groups(2).Value
Next

But this regular expression is greedy and is grabbing the entire string up to the last quote after "joe@company.net". I'm having a hard time putting together an expression that will group this string into the two groups I'm looking for: Name (in the quotes) and E-Mail (in the angle brackets). Does anybody have any advice or suggestions for altering the regexp to get what I need?

Rather than rolling your own regular expression, I would do this:

string[] addresses = toNode.InnerText.Split(",");
foreach(string textAddress in addresses)
{
    textAddress = address.Trim();
    MailAddress address = new MailAddress(textAddress);
    message.CompanyName = address.DisplayName;
    message.CompanyEmail = address.Address;
}

While your regular expression may work for the few test cases that you have shown. Using the MailAddress class will probably be much more reliable in the long run.

How about """([^""]*)"" <([^>]*)>" for the regex? Ie make explicit that the matched part won't include a quote/closing paren. You may also want to use a more restrictive character-range instead.

You need to specify that you want the minimal matched expression. You can also replace (.*) pattern by more precise ones: For example you could exclude the comma and the space... Usually it's better to avoid using .* in a regular expression, because it reduces performance !

For example for the email, you can use a pattern like [\\w-]+@([\\w-]+.)+[\\w-]+ or a more complex one.
You can find some good patterns on : http://regexlib.com/

Not sure what regexp engine ASP.net is running but try the non-greedy variant by adding a ? in the regex.

Example regex

""(.*?)"" <(.*?)>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM