I have a text file that I am trying to parse. As of right now I am using the String.Split Method to get tokenize a string.
Here is some sample text:
GP71011N Sign 1.00 each 4298.96000 4298.96
H50 ENGINE* Sign 1.00 each 9027.65000 9027.65
JR70883* Sign 1.00 each 10674.24300 10674.24
KE31453 Sign 1.00 each 1000.00000 1000.00
MK51645 Sign 6.00 each 13718.06000 82308.36
MK51649 Sign 1.00 each 14331.08000 14331.08
MK51722 Sign 4.00 each 13186.00000 52744.00
ML51651 Sign 5.00 each 15988.00000 79940.00
Right now I am reading the file line by line and removing all the extra spaces with one single space and then I tokenize the string by a single space but now that I am looking at it...that will not work.
This is my current code:
string output = "";
string currentPCat = "";
string currentAccount = "";
bool IsValidLine = false;
var lineNo = 1;
while ((line = file.ReadLine()) != null)
{
if(lineNo <= 36)
{
lineNo++;
}
else
{
line = Regex.Replace(line, @"\s+", " ");
var tokens = line.Split(' ');
if (tokens.Count() >= 4 && tokens.Contains("PCAT:"))
{
currentPCat = tokens[1];
currentAccount = tokens[2];
IsValidLine = true;
}
else if (tokens.Count() == 7)
{
if (IsValidLine)
{
output = output + currentPCat + "," + currentAccount + "," + tokens[1] + "," + tokens[2] + "," + tokens[3] + "," + tokens[4] + "," + tokens[5] + "," + tokens[6] + "\r\n";
}
}
else
{
IsValidLine = false;
}
lineNo++;
}
}
The part that I really need to change is the tokenizer so this part:
line = Regex.Replace(line, @"\s+", " ");
var tokens = line.Split(' ');
I think I need to remove the first line and I want the tokens to be tokenized by 2 or more spaces. How Can I do this?
Sure - use the overload of String.Split
that takes string delimiters instead of char delimiters:
var tokens = line.Split(new string[] {" "},StringSplitOptions.RemoveEmptyEntries);
.Select(s => s.Trim())
.ToArray();
The Trim()
is necessary to remove leading/trailing spaces if there are an odd number of spaces between segments.
代替使用String.Split
使用Regex.Split
并为参数提供"\\s{2,}"
。
string[] tokens = Regex.Split(line, @"\s{2,}");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.