简体   繁体   English

如何将以下字符串拆分为字符串数组

[英]How can I split the following string into a string array

I want to split the following: 我想拆分以下内容:

name[]address[I]dob[]nationality[]occupation[]

So my results would be: 所以我的结果是:

name[]
address[I]
dob[]
nationality[]
occupation[]

I have tried using Regex.Split but can't get these results. 我尝试使用Regex.Split但无法获得这些结果。

You can use Regex.Split with the following regex: 您可以将Regex.Split与以下正则表达式一起使用:

(?<=])(?=[a-z])

which will split between a closing square bracket on the left and a letter on the right. 它会在左侧的右方括号和右侧的字母之间分割。 This is done using lookaround assertions . 这是使用环视断言来完成的。 They don't consume any characters of the match so in this constellation they're pretty handy to match between letters. 它们不消耗匹配的任何字符,因此在此星座中,它们在字母之间进行匹配非常方便。

Basically it means exactly what I wrote: (?<=]) will match a point in the string preceded by a closing bracket, while (?=[az]) matches a point in the string (both zero-width, ie between characters) where a letter follows. 基本上,这意味着我写的内容: (?<=])匹配字符串中的一个点,并带有一个右括号,而(?=[az])匹配字符串中的一个点(均为零宽度,即字符之间) )后面跟着一封信。 You can tweak that a little if your input data looks different from what you gave us in the question. 如果您输入的数据看起来与您在问题中提供的内容不同,则可以稍作调整。

You could also simplify it a little, at the expense of readability, by using (?<=])\\b . 您还可以通过使用(?<=])\\b来稍微简化它,但要牺牲可读性。 But I would advise against that since \\b is tied to \\w which is a really ugly thing, usually. 但是我建议不要这样做,因为\\b\\w绑定在一起,通常这是一件非常丑陋的事情。 It would work roughly the same, but not quite, as \\b in this context amounts to (?=[\\w]) and \\w matches a lot more things, namely decimal digits and an underscore too. 它的工作原理大致相同,但不完全相同,因为在这种情况下\\b等于(?=[\\w])并且\\w匹配更多的东西,即十进制数字和下划线。

Quick PowerShell test (it uses the same regex implementation since it's .NET underneath): 快速PowerShell测试(由于它位于.NET之下,因此使用相同的regex实现):

PS> 'name[]address[I]dob[]nationality[]occupation[]' -split '(?<=])(?=[a-z])'
name[]
address[I]
dob[]
nationality[]
occupation[]

Just for completeness, there is also another option. 只是为了完整性,还有另一种选择。 You can either split the string between the tokens you want to retain, or you could just collect all matches of tokens you want to keep. 您可以要保留的令牌之间分割字符串,也可以只收集要保留的令牌的所有匹配项。 In the latter case you'll need a pattern that matches what you need, such as 在后一种情况下,您需要一个与所需内容相匹配的模式,例如

[a-z]+\[[^\]]*]

or what Dennis gave as an answer (I just tend to avoid \\w and \\b except for quick and dirty hacks or golfing since I maintain that they have no useful application). 丹尼斯给出的答案 (我只是倾向于避免\\w\\b除了快速,肮脏的hack或打高尔夫球以外,因为我坚持认为它们没有有用的应用程序)。 You can use that with Regex.Matches . 您可以将其与Regex.Matches一起Regex.Matches

Generally both approaches can work fine, it then depends on whether the split or the match pattern is easier to understand. 通常,两种方法都可以正常工作,然后取决于拆分或匹配模式是否易于理解。 And for Regex.Matches you'll get Match objects so you don't actually end up with a string[] if you need that, so that'd require .Select(m => m.Value) as well. 对于Regex.Matches您将获得Match对象,因此,如果需要,您实际上不会以string[]结尾,因此也需要.Select(m => m.Value)

In this case I guess neither regex should be left alone without a comment explaining what it does. 在这种情况下,我猜想任何一个正则表达式都不能单独留下,而无需评论它的作用。 I can read them just fine, but many developers are a little uneasy around regexes and especially more advanced concepts like lookaround often warrant an explanation. 我可以很好地阅读它们,但是许多开发人员对正则表达式有些不安,尤其是像环顾四周这样的更高级概念经常需要解释。

text.Split(new Char[] { ']' }, StringSplitOptions.RemoveEmptyEntries).Select(s => s + "]").ToArray();

使用此正则表达式模式:

\w*\[\w*\]

Regular expression should be fine. 正则表达式应该可以。 You can also consider to catch the opening and the closing square brackets with string.IndexOf, for example: 您也可以考虑使用string.IndexOf来捕获左方括号和右方括号,例如:

IEnumerable<string> Results(string input)
{
    int currentIndex = -1;
    while (true)
    {
        currentIndex++;
        int openingBracketIndex = input.IndexOf("[", currentIndex);
        int closingBracketIndex = input.IndexOf("]", currentIndex);

        if (openingBracketIndex == -1 || closingBracketIndex == -1)
            yield break;

        yield return input.Substring(currentIndex, closingBracketIndex - currentIndex + 1);
        currentIndex = closingBracketIndex;     
    }
}
string inputString = "name[]address[I]dob[]nationality[]occupation[]";    
var result = Regex.Matches(inputString, @".*?\[I?\]").Cast<Match>().Select(m => m.Groups[0].Value).ToArray();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM