简体   繁体   中英

Regular Expression - Match Email Address with Exceptions

Please read the question carefully, it's not about validating email addresses!

I'm trying to construct a regular expression (currently in C#) that extracts all email addresses from a text, with two specific exceptions.

I got:

  • user1@company.com
  • user2@company.com
  • user3@company.com
  • user1@private.com
  • user2@private.com

all in the same text file on the same line, delimited by whitespace character.

At first I tried to match all of these email addresses except the ones starting with "user1". I used:

[\S]*(?<!user1)@[\S]*\..[a-zA-Z.]{1,}

which works well. Now I have another requirement that sais: Also do not match if the complete email address matches "user2@private.com". So it should match "user2@company.com", therefore I can't use:

[\S]*(?<!(user1|user2))@[\S]*\..[a-zA-Z.]{1,}

Therefore I tried an additional negative lookbehind:

([\S]*(?<!user1)@[\S]*\..[a-zA-Z.]{1,})(?<!user2@private\.com)

which doesn't work because it seems to be satisfied with matching "user2@private.co" I guess. Is there any way to achieve what I'm trying to do? My head already hurts,...

I would use additional code, but as I'm using a third party software that only gives me the option of Regular Expression, and only the option of a single regular expression, that's all I've got,...

A single regex solution that does not look nice is

(?<!\S)(?!user1@|user2@private\.com(?!\S))\S+@\S+\.[a-zA-Z]{2,}(?!\S)

See the regex demo .

Details :

  • (?<!\\S) - a position not preceded with a non-whitespace char
  • (?!user1@|user2@private\\.com(?!\\S)) - that position cannot be followed with user1@ or user2@private.com not followed with a non-whitespace char
  • \\S+ - 1+ non-whitespace
  • @ - a literal @
  • \\S+ - 1+ non-whitespace
  • \\. - a dot
  • [a-zA-Z]{2,}(?!\\S) - 2 or more ASCII letters not followed with a non-whitespace char.

A more readable approach is to split with whitespace, get the items matching @"^\\S+@\\S+\\.\\S+$" and use a bit of code to filter out unwanted matches:

var s = @"Text user1@company.com here user2@company.com and user3@company.com here user1@private.com more user2@private.com";
var result = s.Split().Where(m => 
        Regex.IsMatch(m, @"^\S+@\S+\.\S+$") && m != "user2@private.com" && !m.StartsWith("user1@"));
foreach (var str in result)
    Console.WriteLine(str);
// => user2@company.com, user3@company.com

See C# demo .

You should be able to use a negative look ahead instead. The following solution should work if you have explicit emails you need to filter out. But keep in mind that it isn't exactly scalable. You would not want to have thousands of emails applied here.

^(?!user1|user2(?!@company.com))[\\S]*@[\\S]*\\..[a-zA-Z.]{1,}

If you suspect that many of these rules could be applied at a future date then you might need to think about a better approach. If the emails to be filtered out are explicit (not patterns) then you could maintain a blacklist somewhere and filter them out after you have extracted/validated email address patterns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM