简体   繁体   中英

How to split a string every time the character changes?

I'd like to turn a string such as abbbbcc into an array like this: [a,bbbb,cc] in C#. I have tried the regex from this Java question like so:

var test = "aabbbbcc";
var split = new Regex("(?<=(.))(?!\\1)").Split(test);

but this results in the sequence [a,a,bbbb,b,cc,c] for me. How can I achieve the same result in C#?

Here is a LINQ solution that uses Aggregate :

var input = "aabbaaabbcc"; 
var result = input
    .Aggregate(" ", (seed, next) => seed + (seed.Last() == next ? "" : " ") + next)
    .Trim()
    .Split(' ');

It aggregates each character based on the last one read, then if it encounters a new character, it appends a space to the accumulating string. Then, I just split it all at the end using the normal String.Split .

Result:

["aa", "bb", "aaa", "bb", "cc"]

I don't know how to get it done with split. But this may be a good alternative:

//using System.Linq;

var test = "aabbbbcc";
var matches = Regex.Matches(test, "(.)\\1*");
var split = matches.Cast<Match>().Select(match => match.Value).ToList();

There are several things going on here that are producing the output you're seeing:

  1. The regex combines a positive lookbehind and a negative lookahead to find the last character that matches the one preceding it but does not match the one following it.

  2. It creates capture groups for every match, which are then fed into the Split method as delimiters. The capture groups are required by the negative lookahead, specifically the \\1 identifier, which basically means "the value of the first capture group in the statement" so it can not be omitted.

  3. Regex.Split , given a capture group or multiple capture groups to match on when identifying the splitting delimiters, will include the delimiters used for every individual Split operation.

Number 3 is why your string array is looking weird, Split will split on the last a in the string, which becomes split[0]. This is followed by the delimiter at split[1], etc...

There is no way to override this behaviour on calling Split. Either compensation as per Gusman's answer or projecting the results of a Matches call as per Ruard's answer will get you what you want.

To be honest I don't exactly understand how that regex works, but you can "repair" the output very easily:

Regex reg = new Regex("(?<=(.))(?!\\1)", RegexOptions.Singleline);
var res = reg.Split("aaabbcddeee").Where((value, index) => index % 2 == 0 && value != "").ToArray();

Could do this easily with Linq, but I don't think it's runtime will be as good as regex.

A whole lot easier to read though.

        var myString = "aaabbccccdeee";
        var splits = myString.ToCharArray()
             .GroupBy(chr => chr)
             .Select(grp => new string(grp.Key, grp.Count()));

returns the values `['aaa', 'bb', 'cccc', 'd', 'eee']

However this won't work if you have a string like "aabbaa" , you'll just get ["aaaa","bb"] as a result instead of ["aa","bb","aa"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM