简体   繁体   English

如何将此字符串拆分为数组?

[英]How can I split this string into an array?

My string is as follows: 我的字符串如下:

smtp:jblack@test.com;SMTP:jb@test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;

I need back: 我需要回来:

smtp:jblack@test.com
SMTP:jb@test.com
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;

The problem is the semi-colons seperate the addresses and also part of the X400 address. 问题是分号分隔地址和X400地址的一部分。 Can anyone suggest how best to split this? 任何人都可以建议如何最好地分裂这个?

PS I should mentioned the order differs so it could be: PS我应该提到订单不同所以它可能是:

X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack@test.com;SMTP:jb@test.com

There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500. 可以有超过3个地址,4,5 ... 10等包括X500地址,但它们都以smtp:SMTP:X400或X500开头。

EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here. 编辑:有了更新的信息,这个答案当然不会有效 - 但它仍然有用,所以我会留在这里。

Will you always have three parts, and you just want to split on the first two semi-colons? 你总是有三个部分,你只想分开前两个分号吗?

If so, just use the overload of Split which lets you specify the number of substrings to return: 如果是这样,只需使用Split的重载,它允许您指定要返回的子字符串数:

string[] bits = text.Split(new char[]{';'}, 3);

May I suggest building a regular expression 我可以建议建立一个正则表达式

(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?

or protocol-less 或无协议

.*?:((?![^:;]*:).)*;?

in other words find anything that starts with one of your protocols. 换句话说,找到任何以你的协议开头的东西。 Match the colon. 匹配冒号。 Then continue matching characters as long as you're not matching one of your protocols. 然后,只要您不匹配其中一个协议,就继续匹配字符。 Finish with a semicolon (optionally). 用分号结束(可选)。

You can then parse through the list of matches splitting on ':' and you'll have your protocols. 然后,您可以解析在':'上拆分的匹配列表,您将拥有自己的协议。 Additionally if you want to add protocols, just add them to the list. 此外,如果要添加协议,只需将它们添加到列表中即可。

Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions. 但是,您可能希望将整个事件指定为不区分大小写,并且仅以大写或小写版本列出协议。

The protocol-less version doesn't care what the names of the protocols are. 无协议版本并不关心协议的名称。 It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon. 它只是通过匹配所有内容来找到它们,但排除了后跟冒号或分号的字符串。

Split by the following regex pattern 按以下正则表达式模式拆分

string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");

EDIT: better one can accept more special chars in the protocol name. 编辑:更好的人可以接受协议名称中更多的特殊字符。

string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");

http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx check there, you can specify the number of splits you want. http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx检查那里,您可以指定所需的分割数。 so in your case you would do 所以在你的情况下你会这样做

string.split(new char[]{';'}, 3);

This caught my curiosity .... So this code actually does the job, but again, wants tidying :) 这引起了我的好奇心 ....所以这段代码实际上完成了这项工作,但又一次,想要整理:)

My final attempt - stop changing what you need ;=) 我最后的尝试 - 停止改变你需要的东西; =)

static void Main(string[] args)
{
    string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:jblack@test.com;X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:jb@test.com;";

    string[] parts = fneh.Split(new char[] { ';' });

    List<string> addresses = new List<string>();
    StringBuilder address = new StringBuilder();
    foreach (string part in parts)
    {
        if (part.Contains(":"))
        {
            if (address.Length > 0)
            {
                addresses.Add(semiColonCorrection(address.ToString()));
            }
            address = new StringBuilder();
            address.Append(part);
        }
        else
        {
            address.AppendFormat(";{0}", part);
        }
    }
    addresses.Add(semiColonCorrection(address.ToString()));

    foreach (string emailAddress in addresses)
    {
        Console.WriteLine(emailAddress);
    }
    Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
    if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
    {
        return string.Format("{0};", address);
    }
    else
    {
        return address;
    }
}

Not the fastest if you are doing this a lot but it will work for all cases I believe. 如果你这么做的话,不是最快的,但它会适用于我认为的所有情况。

        string input1 = "smtp:jblack@test.com;SMTP:jb@test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
        string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack@test.com;SMTP:jb@test.com";
        Regex splitEmailRegex = new Regex(@"(?<key>\w+?):(?<value>.*?)(\w+:|$)");

        List<string> sets = new List<string>();

        while (input2.Length > 0)
        {
            Match m1 = splitEmailRegex.Matches(input2)[0];
            string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
            sets.Add(s1);
            input2 = input2.Substring(s1.Length);
        }

        foreach (var set in sets)
        {
            Console.WriteLine(set);
        }

        Console.ReadLine();

Of course many will claim Regex: Now you have two problems. 当然,很多人会声称正则表达式:现在你有两个问题。 There may even be a better regex answer than this. 甚至可能有比这更好的正则表达式答案。

You could always split on the colon and have a little logic to grab the key and value. 你可以随时拆分冒号并有一点逻辑来获取键和值。

string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
    string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
    string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
    values.Add(String.Concat(key, ":", value));
}

Tested it with both of your samples and it works fine. 用两个样品测试它,它工作正常。

Try these regexes. 试试这些正则表达式。 You can extract what you're looking for using named groups. 您可以使用命名组提取您要查找的内容。

X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)

Make sure when constructing them you specify case insensitive. 确保在构造它们时指定不区分大小写。 They seem to work with the samples you gave 它们似乎与您提供的样品一起使用

Lots of attempts. 很多尝试。 Here is mine ;) 这是我的;)

string src = "smtp:jblack@test.com;SMTP:jb@test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";

Regex r = new Regex(@"
   (?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
   (?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
   (?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);

foreach (Match m in r.Matches(src))
{
    if (m.Groups["smtp"].Captures.Count != 0)
        Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
    else if (m.Groups["X400"].Captures.Count != 0)
        Console.WriteLine("X400: {0}", m.Groups["X400"]);
    else if (m.Groups["X500"].Captures.Count != 0)
        Console.WriteLine("X500: {0}", m.Groups["X500"]);   
}

This finds all smtp, x400 or x500 addresses in the string in any order of appearance. 这将以任何外观顺序查找字符串中的所有smtp,x400或x500地址。 It also identifies the type of address ready for further processing. 它还标识准备进一步处理的地址类型。 The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern. 地址本身中文本smtp,x400或x500的外观不会扰乱模式。

This works! 这有效!

    string input =
        "smtp:jblack@test.com;SMTP:jb@test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
    string[] parts = input.Split(';');
    List<string> output = new List<string>();
    foreach(string part in parts)
    {
        if (part.Contains(":"))
        {
            output.Add(part + ";");
        }
        else if (part.Length > 0)
        {
            output[output.Count - 1] += part + ";";
        }
    }
    foreach(string s in output)
    {
        Console.WriteLine(s);
    }

Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element. 分号(;)拆分然后循环结果,重新组合没有冒号(:)的前一个元素。

string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
  +"Black;;smtp:jblack@test.com;SMTP:jb@test.com";

string[] rawSplit = input.Split(';');

List<string> result = new List<string>();
  //now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
  if (buffer == string.Empty)
  {
    buffer = s;
  }
  else if (s.Contains(':'))
  {   
    result.Add(buffer);
    buffer = s;
  }
  else
  {
    buffer += ";" + s;
  }
}
result.Add(buffer);

foreach (string s in result)
  Console.WriteLine(s);

here is another possible solution. 这是另一种可能的解决方案

string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });

bits[0], bits[1], and bits[2] will then contains the three parts in the order from your original string. 然后,位[0],位[1]和位[2]将按原始字符串的顺序包含三个部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM