简体   繁体   中英

How to split a string at every four words in c#?

I'm using C# 4.0 and have come across a situation where I have to split a whole string by every four words and store it in a List object. So suppose my string contains: "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53" , the result should be:

USD 1.23 1.12 1.42
EUR 0.2 0.3 0.42
JPY 1.2 1.42 1.53

It shall be saved into a List object. I have tried the following

List<string> test = new List<string>(data.Split(' ')); //(not working as it splits on every word)

With a little Linq magic:

var wordGroups = text.Split(' ')
                     .Select((word, i) => new { Word = word, Pos = i })
                     .GroupBy(w => w.Pos / 4)
                     .Select(g => string.Join(" ", g.Select(x=> x.Word)))
                     .ToList();

Of course my answer is not as glamour as the linq ones, but I wish to post this old school method.

void Main()
{
    List<string> result = new List<string>();

    string inp = "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53";
    while(true)
    {
        int pos = IndexOfN(inp, " ", 4);
        if(pos != -1)
        {
            string part = inp.Substring(0, pos);
            inp = inp.Substring(pos + 1);
            result.Add(part);
        }
        else
        {
            result.Add(inp);
            break;
        }
    }
}

int IndexOfN(string input, string sep, int count)
{
    int pos = input.IndexOf(sep);
    count--;
    while(pos > -1 && count > 0)
    {
        pos = input.IndexOf(sep, pos+1);
        count--;
    }
    return pos ;
}

EDIT: If there is no control on the numbers on the input string (for example, if some money has only 1 or 2 values) then there is no way to substring correctly in blocks of 4 the input string. We can resort to Regex

List<string> result = new List<string>();

string rExp = @"[A-Z]{1,3}(\d|\s|\.)+";
// --- EUR with only two numeric values---
string inp = "USD 1.23 1.12 1.42 EUR 0.2 0.42 JPY 1.2 1.42 1.53";
Regex r = new Regex(rExp);
var m = r.Matches(inp);
foreach(Match h in m)
   result.Add(h.ToString());

this pattern accepts also numbers with comma as decimal separator and money symbols without any numbers ("GPB USD 1,23 1,12 1.42 "

string rExp = @"[A-Z]{1,3}(,|\d|\s|\.)*"; 

RegEx Expression Language - Quick Reference

最简单的方法是首先将每个单词拆分成一个列表,然后编写一个小循环,重新组合每组四个单词。

The reactive framework guys have an bunch of extensions for IEnumerable<T> . One of them is Buffer which does what you want so simply.

Here it is:

var text = "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53";
var result = text.Split(' ').Buffer(4);

And that gives:

结果

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM