简体   繁体   English

每次字符更改时如何分割字符串?

[英]How to split a string every time the character changes?

I'd like to turn a string such as abbbbcc into an array like this: [a,bbbb,cc] in C#. 我想将诸如abbbbcc类的字符串abbbbcc为如下数组:C#中的[a,bbbb,cc] I have tried the regex from this Java question like so: 我已经从这个Java问题尝试过正则表达式,如下所示:

var test = "aabbbbcc";
var split = new Regex("(?<=(.))(?!\\1)").Split(test);

but this results in the sequence [a,a,bbbb,b,cc,c] for me. 但这对我来说是顺序[a,a,bbbb,b,cc,c] How can I achieve the same result in C#? 如何在C#中获得相同的结果?

Here is a LINQ solution that uses Aggregate : 这是使用Aggregate的LINQ解决方案:

var input = "aabbaaabbcc"; 
var result = input
    .Aggregate(" ", (seed, next) => seed + (seed.Last() == next ? "" : " ") + next)
    .Trim()
    .Split(' ');

It aggregates each character based on the last one read, then if it encounters a new character, it appends a space to the accumulating string. 它基于最后一次读取来汇总每个字符,然后如果遇到一个新字符,则会在累积字符串后附加一个空格。 Then, I just split it all at the end using the normal String.Split . 然后,我使用普通的String.Split最后将其全部String.Split

Result: 结果:

["aa", "bb", "aaa", "bb", "cc"] [“ aa”,“ bb”,“ aaa”,“ bb”,“ cc”]

I don't know how to get it done with split. 我不知道如何用拆分完成它。 But this may be a good alternative: 但这可能是一个不错的选择:

//using System.Linq;

var test = "aabbbbcc";
var matches = Regex.Matches(test, "(.)\\1*");
var split = matches.Cast<Match>().Select(match => match.Value).ToList();

There are several things going on here that are producing the output you're seeing: 这里发生的几件事正在产生您所看到的输出:

  1. The regex combines a positive lookbehind and a negative lookahead to find the last character that matches the one preceding it but does not match the one following it. 正则表达式将正向查找和负向查找结合起来,以找到与前面的字符匹配但与后面的字符不匹配的最后一个字符。

  2. It creates capture groups for every match, which are then fed into the Split method as delimiters. 它为每个匹配项创建捕获组,然后将它们作为分隔符输入到Split方法中。 The capture groups are required by the negative lookahead, specifically the \\1 identifier, which basically means "the value of the first capture group in the statement" so it can not be omitted. 否定的前行需要捕获组,特别是\\1标识符,它基本上表示“语句中第一个捕获组的值”,因此不能省略。

  3. Regex.Split , given a capture group or multiple capture groups to match on when identifying the splitting delimiters, will include the delimiters used for every individual Split operation. Regex.Split ,给定一个捕获组或多个捕获组以在识别拆分定界符时进行匹配,将包括用于每个单独的Split操作的定界符。

Number 3 is why your string array is looking weird, Split will split on the last a in the string, which becomes split[0]. 数字3是为什么您的字符串数组看起来很奇怪的原因,Split将在字符串的最后a进行拆分,该拆分将成为split [0]。 This is followed by the delimiter at split[1], etc... 随后是split [1]等分隔符,等等。

There is no way to override this behaviour on calling Split. 无法在调用Split时覆盖此行为。 Either compensation as per Gusman's answer or projecting the results of a Matches call as per Ruard's answer will get you what you want. 根据Gusman的答案进行补偿,或者根据Ruard的答案预测Matches通话的结果,您都可以得到想要的。

To be honest I don't exactly understand how that regex works, but you can "repair" the output very easily: 老实说,我不完全了解该正则表达式的工作原理,但是您可以非常轻松地“修复”输出:

Regex reg = new Regex("(?<=(.))(?!\\1)", RegexOptions.Singleline);
var res = reg.Split("aaabbcddeee").Where((value, index) => index % 2 == 0 && value != "").ToArray();

Could do this easily with Linq, but I don't think it's runtime will be as good as regex. 使用Linq可以轻松做到这一点,但是我认为它的运行时不会像正则表达式那样好。

A whole lot easier to read though. 整体而言,更容易阅读。

        var myString = "aaabbccccdeee";
        var splits = myString.ToCharArray()
             .GroupBy(chr => chr)
             .Select(grp => new string(grp.Key, grp.Count()));

returns the values `['aaa', 'bb', 'cccc', 'd', 'eee'] 返回值[['aaa','bb','cccc','d','eee']

However this won't work if you have a string like "aabbaa" , you'll just get ["aaaa","bb"] as a result instead of ["aa","bb","aa"] 但是,如果您有类似"aabbaa"这样的字符串,则此方法将不起作用,您将只得到["aaaa","bb"]而不是["aa","bb","aa"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM