简体   繁体   English

如何在c#中每四个单词拆分一个字符串?

[英]How to split a string at every four words in c#?

I'm using C# 4.0 and have come across a situation where I have to split a whole string by every four words and store it in a List object. 我正在使用C#4.0并且遇到过这样一种情况:我必须每四个单词拆分整个字符串并将其存储在List对象中。 So suppose my string contains: "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53" , the result should be: 因此,假设我的字符串包含: "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53" ,结果应为:

USD 1.23 1.12 1.42
EUR 0.2 0.3 0.42
JPY 1.2 1.42 1.53

It shall be saved into a List object. 它应保存到List对象中。 I have tried the following 我尝试了以下内容

List<string> test = new List<string>(data.Split(' ')); //(not working as it splits on every word)

With a little Linq magic: 带着一点Linq魔法:

var wordGroups = text.Split(' ')
                     .Select((word, i) => new { Word = word, Pos = i })
                     .GroupBy(w => w.Pos / 4)
                     .Select(g => string.Join(" ", g.Select(x=> x.Word)))
                     .ToList();

Of course my answer is not as glamour as the linq ones, but I wish to post this old school method. 当然,我的回答并不像linq那样有魅力,但我希望发布这种old school方法。

void Main()
{
    List<string> result = new List<string>();

    string inp = "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53";
    while(true)
    {
        int pos = IndexOfN(inp, " ", 4);
        if(pos != -1)
        {
            string part = inp.Substring(0, pos);
            inp = inp.Substring(pos + 1);
            result.Add(part);
        }
        else
        {
            result.Add(inp);
            break;
        }
    }
}

int IndexOfN(string input, string sep, int count)
{
    int pos = input.IndexOf(sep);
    count--;
    while(pos > -1 && count > 0)
    {
        pos = input.IndexOf(sep, pos+1);
        count--;
    }
    return pos ;
}

EDIT: If there is no control on the numbers on the input string (for example, if some money has only 1 or 2 values) then there is no way to substring correctly in blocks of 4 the input string. 编辑:如果输入字符串上的数字没有控制(例如,如果一些钱只有1或2个值),则无法在输入字符串的4个块中正确子串。 We can resort to Regex 我们可以诉诸正则表达

List<string> result = new List<string>();

string rExp = @"[A-Z]{1,3}(\d|\s|\.)+";
// --- EUR with only two numeric values---
string inp = "USD 1.23 1.12 1.42 EUR 0.2 0.42 JPY 1.2 1.42 1.53";
Regex r = new Regex(rExp);
var m = r.Matches(inp);
foreach(Match h in m)
   result.Add(h.ToString());

this pattern accepts also numbers with comma as decimal separator and money symbols without any numbers ("GPB USD 1,23 1,12 1.42 " 此模式也接受带逗号的数字作为小数分隔符和没有任何数字的货币符号(“GPB USD 1,23 1,12 1.42”

string rExp = @"[A-Z]{1,3}(,|\d|\s|\.)*"; 

RegEx Expression Language - Quick Reference RegEx表达式语言 - 快速参考

最简单的方法是首先将每个单词拆分成一个列表,然后编写一个小循环,重新组合每组四个单词。

The reactive framework guys have an bunch of extensions for IEnumerable<T> . 反应式框架人员有一堆IEnumerable<T>的扩展。 One of them is Buffer which does what you want so simply. 其中之一就是Buffer ,它可以做到你想要的那么简单。

Here it is: 这里是:

var text = "USD 1.23 1.12 1.42 EUR 0.2 0.3 0.42 JPY 1.2 1.42 1.53";
var result = text.Split(' ').Buffer(4);

And that gives: 这给了:

结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM