使用正則表達式刪除特殊字符，同時保留有效的電子郵件格式

Question

我在C＃中使用它。 我從以下格式的類似電子郵件的字符串開始：

employee[any characters]@company[any characters].com

我想從[任何字符]件中剝離非字母數字。

例如，我想要這個"employee1@2 r&a*d.m32@@company98 ';99..com"

成為這個"employee12radm32@company9899.com"

這個表達式只是消除了所有的特殊之處，但是我想在公司前留下一個@，再留下一個。 com之前。 因此，我需要該表達式忽略或掩蓋員工，@ company和.com件...只是不確定如何做到這一點。

var regex = new Regex("[^0-9a-zA-Z]"); //whitelist the acceptables, remove all else.

Answer 1

您可以使用以下正則表達式：

(?:\W)(?!company|com)

它將替換任何特殊的字符，除非company后面跟着它（所以@company將保留）或com （所以.com將保留）：

employee1@2 r&a*d.m32@@company98 ';99..com

會變成

employee12radm32@company9899.com

請參閱： http ： //regex101.com/r/fY8jD7/2

請注意，您需要使用g修飾符替換所有出現的此類有害字符。 這是C＃中的默認設置，因此您可以使用簡單的Regex.Replace() ：

https://dotnetfiddle.net/iTeZ4F

更新：

辦公室 正則表達式(?:\\W)(?!com)就足夠了-但它仍然會留下類似的部分#com或~companion ，因為它們匹配也是如此。 因此，這仍然不能保證輸入（或可以說轉換）是100％有效的。 您應該考慮簡單地拋出一個驗證錯誤，而不是嘗試清理輸入以符合您的需求。

即使您也設法處理這種情況-如果@company或.com出現兩次，該怎么辦？

Answer 2

您可以簡化正則表達式並替換為

tmp = Regex.Replace(n, @"\W+", "");

其中\\w表示所有字母，數字和下划線，而\\W是\\w的否定版本。 通常，最好創建允許字符的白名單，而不要嘗試預測所有不允許的符號。

Answer 3

我可能會寫類似：

（忽略大小寫，如果需要區分大小寫，請發表評論）。

DotNetFiddle示例

using System;
using System.Linq;

public class Program
{
    public static void Main()
    {
        var email = "employee1@2 r&a*d.m32@@company98 ';99..com";

        var result = GetValidEmail(email);

        Console.WriteLine(result);
    }


    public static string GetValidEmail(string email)
    {
      var result = email.ToLower();

      // Does it contain everything we need?
      if (email.StartsWith("employee")
          && email.EndsWith(".com")
          && email.Contains("@company"))
      {
        // remove beginning and end.
        result = result.Substring(8, result.Length - 13);
        // remove @company
        var split = result.Split(new string[] { "@company" },
          StringSplitOptions.RemoveEmptyEntries);

        // validate we have more than two (you may not need this)
        if (split.Length != 2)
        {
          throw new ArgumentException("Invalid Email.");
        }

        // recreate valid email
        result = "employee"
          + new string (split[0].Where(c => char.IsLetterOrDigit(c)).ToArray())
          + "@company"
          + new string (split[1].Where(c => char.IsLetterOrDigit(c)).ToArray())
          + ".com";

      }
      else
      {
        throw new ArgumentException("Invalid Email.");
      }

      return result;
    }
}

結果

員工12radm32@company989.com

Answer 4

@dognose提供了一個很棒的正則表達式解決方案。 我將在這里保留我的答案作為參考，但是我會同意他的觀點，因為它的長度更短/更干凈。

var companyName = "company";
var extension = "com";
var email = "employee1@2 r&a*d.m32@@company98 ';99..com";

var tempEmail = Regex.Replace(email, @"\W+", "");

var companyIndex = tempEmail.IndexOf(companyName);
var extIndex = tempEmail.LastIndexOf(extension);

var fullEmployeeName = tempEmail.Substring(0, companyIndex);
var fullCompanyName = tempEmail.Substring(companyIndex, extIndex - companyIndex);

var validEmail = fullEmployeeName + "@" + fullCompanyName + "." + extension;

Answer 5

您嘗試做的是，盡管可能，但使用單個正則表達式模式會有點復雜。 您可以將此方案分解為較小的步驟。 一種方法是提取“ Username和“ Domain組（基本上是您所描述的[any character] ），“修復”每個組，然后將其替換為原始組。 像這樣：

// Original input to transform.
string input = @"employee1@2 r&a*d.m32@@company98 ';99..com";

// Regular expression to find and extract "Username" and "Domain" groups, if any.
var matchGroups = Regex.Match(input, @"employee(?<UsernameGroup>(.*))@company(?<DomainGroup>(.*)).com");

string validInput = input;

// Get the username group from the list of matches.
var usernameGroup = matchGroups.Groups["UsernameGroup"];

if (!string.IsNullOrEmpty(usernameGroup.Value))
{
    // Replace non-alphanumeric values with empty string.
    string validUsername = Regex.Replace(usernameGroup.Value, "[^a-zA-Z0-9]", string.Empty);

    // Replace the the invalid instance with the valid one.
    validInput = validInput.Replace(usernameGroup.Value, validUsername);
}

// Get the domain group from the list of matches.
var domainGroup = matchGroups.Groups["DomainGroup"];

if (!string.IsNullOrEmpty(domainGroup.Value))
{
    // Replace non-alphanumeric values with empty string.
    string validDomain = Regex.Replace(domainGroup.Value, "[^a-zA-Z0-9]", string.Empty);

    // Replace the the invalid instance with the valid one.
    validInput = validInput.Replace(domainGroup.Value, validDomain);
}

Console.WriteLine(validInput);

將輸出employee12radm32@company9899.com 。

使用正則表達式刪除特殊字符，同時保留有效的電子郵件格式

問題描述

5 個解決方案

解決方案1
3 已采納 2014-10-29 21:47:59

解決方案2
0 2014-10-29 21:26:18

解決方案3
0 2014-10-29 21:34:04

解決方案4
0 2014-10-29 21:37:57

解決方案5
0 2014-10-29 21:53:36

使用正則表達式刪除特殊字符，同時保留有效的電子郵件格式

問題描述

5 個解決方案

解決方案1 3 已采納 2014-10-29 21:47:59

解決方案2 0 2014-10-29 21:26:18

解決方案3 0 2014-10-29 21:34:04

解決方案4 0 2014-10-29 21:37:57

解決方案5 0 2014-10-29 21:53:36

解決方案1
3 已采納 2014-10-29 21:47:59

解決方案2
0 2014-10-29 21:26:18

解決方案3
0 2014-10-29 21:34:04

解決方案4
0 2014-10-29 21:37:57

解決方案5
0 2014-10-29 21:53:36