简体   繁体   English

电子邮件地址拆分

[英]Email address splitting

So I have a string that I need to split by semicolon's 所以我有一个字符串,我需要用分号分割

Email address: "one@tw;,.'o"@hotmail.com;"some;thing"@example.com 电子邮件地址: "one@tw;,.'o"@hotmail.com;"some;thing"@example.com

Both of the email addresses are valid 这两个电子邮件地址都有效

So I want to have a List<string> of the following: 所以我希望得到以下的List<string>

  • "one@tw;,.'o"@hotmail.com "one@tw;,.'o"@hotmail.com
  • "some;thing"@example.com "some;thing"@example.com

But the way I am currently splitting the addresses is not working: 但我目前分割地址的方式不起作用:

var addresses = emailAddressString.Split(new[] { ';' }, StringSplitOptions.RemoveEmptyEntries)
                .Select(x => x.Trim()).ToList();

Because of the multiple ; 因为多重; characters I end up with invalid email addresses. 我最终得到了无效的电子邮件地址。

I have tried a few different ways, even going down working out if the string contains quotes and then finding the index of the ; 我尝试了几种不同的方法,甚至在字符串包含引号然后找到索引的情况下进行处理; characters and working it out that way, but it's a real pain. 角色,并以这种方式解决,但这是一个真正的痛苦。

Does anyone have any better suggestions? 有没有人有更好的建议?

Assuming that double-quotes are not allowed, except for the opening and closing quotes ahead of the "at" sign @ , you can use this regular expression to capture e-mail addresses: 假设不允许使用双引号,除了“at”符号@之前的开始和结束引号,您可以使用此正则表达式来捕获电子邮件地址:

((?:[^@"]+|"[^"]*")@[^;]+)(?:;|$)

The idea is to capture either an unquoted [^@"]+ or a quoted "[^"]*" part prior to @ , and then capture everything up to semicolon ; 这样做是为了捕获任一非引用[^@"]+或引述"[^"]*"部分之前@ ,然后捕获一切高达分号; or the end anchor $ . 或结束锚$

Demo of the regex. 正则表达式的演示。

var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world";
var mm = Regex.Matches(input, "((?:[^@\"]+|\"[^\"]*\")@[^;]+)(?:;|$)");
foreach (Match m in mm) {
    Console.WriteLine(m.Groups[1].Value);
}

This code prints 此代码打印

"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world

Demo 1. 演示1。

If you would like to allow escaped double-quotes inside double-quotes, you could use a more complex expression: 如果您希望在双引号内允许转义双引号,则可以使用更复杂的表达式:

((?:(?:[^@\"]|(?<=\\)\")+|\"([^\"]|(?<=\\)\")*\")@[^;]+)(?:;|$)

Everything else remains the same. 其他一切都是一样的。

Demo 2. 演示2。

I obviously started writing my anti regex method at around the same time as juharr (Another answer). 我显然在与juharr同时开始编写我的反正则表达方法(另一个答案)。 I thought that since I already have it written I would submit it. 我认为既然我已经写好了,我会提交它。

    public static IEnumerable<string> SplitEmailsByDelimiter(string input, char delimiter)
    {
        var startIndex = 0;
        var delimiterIndex = 0;

        while (delimiterIndex >= 0)
        {
            delimiterIndex = input.IndexOf(';', startIndex);
            string substring = input;
            if (delimiterIndex > 0)
            {
                substring = input.Substring(0, delimiterIndex);
            }

            if (!substring.Contains("\"") || substring.IndexOf("\"") != substring.LastIndexOf("\""))
            {
                yield return substring;
                input = input.Substring(delimiterIndex + 1);
                startIndex = 0;
            }
            else
            {
                startIndex = delimiterIndex + 1;
            }
        }
    }

Then the following 然后是以下

            var input = "blah@blah.com;\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;asdasd@asd.co.uk;";
            foreach (var email in SplitEmailsByDelimiter(input, ';'))
            {
                Console.WriteLine(email);
            }

Would give this output 会给出这个输出

blah@blah.com
"one@tw;,.'o"@hotmail.com
"some;thing"@example.com
hello@world
asdasd@asd.co.uk

You can also do this without using regular expressions. 您也可以不使用正则表达式执行此操作。 The following extension method will allow you to specify a delimiter character and a character to begin and end escape sequences. 以下扩展方法将允许您指定分隔符和开始和结束转义序列的字符。 Note it does not validate that all escape sequences are closed. 请注意,它不会验证是否已关闭所有转义序列。

public static IEnumerable<string> SpecialSplit(
    this string str, char delimiter, char beginEndEscape)
{
    int beginIndex = 0;
    int length = 0;
    bool escaped = false;
    foreach (char c in str)
    {
        if (c == beginEndEscape)
        {
            escaped = !escaped;
        }

        if (!escaped && c == delimiter)
        {
            yield return str.Substring(beginIndex, length);
            beginIndex += length + 1;
            length = 0;
            continue;
        }

        length++;
    }

    yield return str.Substring(beginIndex, length);
}

Then the following 然后是以下

var input = "\"one@tw;,.'o\"@hotmail.com;\"some;thing\"@example.com;hello@world;\"D;D@blah;blah.com\"";
foreach (var address in input.SpecialSplit(';', '"')) 
    Console.WriteLine(v);

While give this output 同时给出这个输出

"one@tw;,.'o"@hotmail.com "one@tw;,.'o"@hotmail.com

"some;thing"@example.com "some;thing"@example.com

hello@world 你好,世界

"D;D@blah;blah.com" “d; d @等等; blah.com”

Here's the version that works with an additional single escape character. 这是与另外一个转义字符一起使用的版本。 It assumes that two consecutive escape characters should become one single escape character and it's escaping both the beginEndEscape charter so it will not trigger the beginning or end of an escape sequence and it also escapes the delimiter . 它假定两个连续的转义字符应该成为一个转义字符并且它正在转义beginEndEscape章程,因此它不会触发转义序列的开头或结尾,它也会转义delimiter Anything else that comes after the escape character will be left as is with the escape character removed. 转义字符后面的任何其他内容将保留为删除转义字符。

public static IEnumerable<string> SpecialSplit(
    this string str, char delimiter, char beginEndEscape, char singleEscape)
{
    StringBuilder builder = new StringBuilder();
    bool escapedSequence = false;
    bool previousEscapeChar = false;
    foreach (char c in str)
    {
        if (c == singleEscape && !previousEscapeChar)
        {
            previousEscapeChar = true;
            continue;
        }

        if (c == beginEndEscape && !previousEscapeChar)
        {
            escapedSequence = !escapedSequence;
        }

        if (!escapedSequence && !previousEscapeChar && c == delimiter)
        {
            yield return builder.ToString();
            builder.Clear();
            continue;
        }

        builder.Append(c);
        previousEscapeChar = false;
    }

    yield return builder.ToString();
}

Finally you probably should add null checking for the string that is passed in and note that both will return a sequence with one empty string if you pass in an empty string. 最后,您可能应该为传入的字符串添加null检查,并注意如果传入一个空字符串,两者都将返回一个带有一个空字符串的序列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM