简体   繁体   中英

Split string into tokens using 2 or more spaces

I have a text file that I am trying to parse. As of right now I am using the String.Split Method to get tokenize a string.

Here is some sample text:

  GP71011N                  Sign                        1.00 each    4298.96000       4298.96
  H50 ENGINE*               Sign                        1.00 each    9027.65000       9027.65
  JR70883*                  Sign                        1.00 each   10674.24300      10674.24
  KE31453                   Sign                        1.00 each    1000.00000       1000.00
  MK51645                   Sign                        6.00 each   13718.06000      82308.36
  MK51649                   Sign                        1.00 each   14331.08000      14331.08
  MK51722                   Sign                        4.00 each   13186.00000      52744.00
  ML51651                   Sign                        5.00 each   15988.00000      79940.00

Right now I am reading the file line by line and removing all the extra spaces with one single space and then I tokenize the string by a single space but now that I am looking at it...that will not work.

This is my current code:

string output = "";

string currentPCat = "";
string currentAccount = "";

bool IsValidLine = false;
var lineNo = 1;
while ((line = file.ReadLine()) != null)
{
    if(lineNo <= 36)
    {
        lineNo++;
    }
    else
    {
        line = Regex.Replace(line, @"\s+", " ");
        var tokens = line.Split(' ');
        if (tokens.Count() >= 4 && tokens.Contains("PCAT:"))
        {
            currentPCat = tokens[1];
            currentAccount = tokens[2];
            IsValidLine = true;
        }
        else if (tokens.Count() == 7)
        {
            if (IsValidLine)
            {
                output = output + currentPCat + "," + currentAccount + "," + tokens[1] + "," + tokens[2] + "," + tokens[3] + "," + tokens[4] + "," + tokens[5] + "," + tokens[6] + "\r\n";
            }
        }
        else
        {
            IsValidLine = false;
        }
        lineNo++;
    }
}

The part that I really need to change is the tokenizer so this part:

line = Regex.Replace(line, @"\s+", " ");
var tokens = line.Split(' ');

I think I need to remove the first line and I want the tokens to be tokenized by 2 or more spaces. How Can I do this?

Sure - use the overload of String.Split that takes string delimiters instead of char delimiters:

var tokens = line.Split(new string[] {"  "},StringSplitOptions.RemoveEmptyEntries);
                 .Select(s => s.Trim())
                 .ToArray();

The Trim() is necessary to remove leading/trailing spaces if there are an odd number of spaces between segments.

代替使用String.Split使用Regex.Split并为参数提供"\\s{2,}"

string[] tokens = Regex.Split(line, @"\s{2,}");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM