I have a bunch of text files that has a custom format, looking like this:
App Name
Export Layout
Produced at 24/07/2011 09:53:21
Field Name Length
NAME 100
FULLNAME1 150
ADDR1 80
ADDR2 80
Any whitespaces may be tabs or spaces. The file may contain any number of field names and lengths.
I want to get all the field names and their corresponding field lengths and perhaps store them in a dictionary. This information will be used to process a corresponding fixed width data file having the mentioned field names and field lengths.
I know how to skip lines using ReadLine(). What I don't know is how to say: "When you reach the line that starts with 'Field Name', skip one more line, then starting from the next line, grab all the words on the left column and the numbers on the right column."
I have tried String.Trim() but that doesn't remove the whitespaces in between .
Thanks in advance.
You can use SkipWhile(l => !l.TrimStart().StartsWith("Field Name")).Skip(1)
:
Dictionary<string, string> allFieldLengths = File.ReadLines("path")
.SkipWhile(l => !l.TrimStart().StartsWith("Field Name")) // skips lines that don't start with "Field Name"
.Skip(1) // go to next line
.SkipWhile(l => string.IsNullOrWhiteSpace(l)) // skip following empty line(s)
.Select(l =>
{ // anonymous method to use "real code"
var line = l.Trim(); // remove spaces or tabs from start and end of line
string[] token = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
return new { line, token }; // return anonymous type from
})
.Where(x => x.token.Length == 2) // ignore all lines with more than two fields (invalid data)
.Select(x => new { FieldName = x.token[0], Length = x.token[1] })
.GroupBy(x => x.FieldName) // groups lines by FieldName, every group contains it's Key + all anonymous types which belong to this group
.ToDictionary(xg => xg.Key, xg => string.Join(",", xg.Select(x => x.Length)));
line.Split(new[] { ' ', '\\t' }, StringSplitOptions.RemoveEmptyEntries)
will split by space and tabs and ignores all empty spaces. Use GroupBy
to ensure that all keys are unique in the dictionary. In the case of duplicate field-names the Length
will be joined with comma.
Edit : since you have requested a non-LINQ version, here is it:
Dictionary<string, string> allFieldLengths = new Dictionary<string, string>();
bool headerFound = false;
bool dataFound = false;
foreach (string l in File.ReadLines("path"))
{
string line = l.Trim();
if (!headerFound && line.StartsWith("Field Name"))
{
headerFound = true;
// skip this line:
continue;
}
if (!headerFound)
continue;
if (!dataFound && line.Length > 0)
dataFound = true;
if (!dataFound)
continue;
string[] token = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
if (token.Length != 2)
continue;
string fieldName = token[0];
string length = token[1];
string lengthInDict;
if (allFieldLengths.TryGetValue(fieldName, out lengthInDict))
// append this length
allFieldLengths[fieldName] = lengthInDict + "," + length;
else
allFieldLengths.Add(fieldName, length);
}
I like the LINQ version more because it's much more readable and maintainable (imo).
Based on the assumption that the position of the header line is fixed, we may consider actual key-value pairs to start from the 9th line. Then, using the ReadAllLines
method to return a String
array from the file, we just start processing from index 8 onwards:
string[] lines = File.ReadAllLines(filepath);
Dictionary<string,int> pairs = new Dictionary<string,int>();
for(int i=8;i<lines.Length;i++)
{
string[] pair = Regex.Replace(lines[i],"(\\s)+",";").Split(';');
pairs.Add(pair[0],int.Parse(pair[1]));
}
This is a skeleton, not accounting for exception handling, but I guess it should get you started.
You can use String.StartsWith()
to detect "FieldName". Then String.Split()
with a parameter of null to split by whitespace. This will get you your fieldname and length strings.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.