简体   繁体   中英

Parsing a text file with a custom format in C#

I have a bunch of text files that has a custom format, looking like this:

App Name    
Export Layout

Produced at 24/07/2011 09:53:21


Field Name                             Length                                                       

NAME                                   100                                                           
FULLNAME1                              150                                                           
ADDR1                                  80                                                           
ADDR2                                  80          

Any whitespaces may be tabs or spaces. The file may contain any number of field names and lengths.

I want to get all the field names and their corresponding field lengths and perhaps store them in a dictionary. This information will be used to process a corresponding fixed width data file having the mentioned field names and field lengths.

I know how to skip lines using ReadLine(). What I don't know is how to say: "When you reach the line that starts with 'Field Name', skip one more line, then starting from the next line, grab all the words on the left column and the numbers on the right column."

I have tried String.Trim() but that doesn't remove the whitespaces in between .

Thanks in advance.

You can use SkipWhile(l => !l.TrimStart().StartsWith("Field Name")).Skip(1) :

Dictionary<string, string> allFieldLengths = File.ReadLines("path")
    .SkipWhile(l => !l.TrimStart().StartsWith("Field Name")) // skips lines that don't start with "Field Name"
    .Skip(1)                                       // go to next line
    .SkipWhile(l => string.IsNullOrWhiteSpace(l))  // skip following empty line(s)
    .Select(l =>                                   
    {                                              // anonymous method to use "real code"
        var line = l.Trim();                       // remove spaces or tabs from start and end of line
        string[] token = line.Split(new[] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries);
        return new { line, token };                // return anonymous type from 
    })
    .Where(x => x.token.Length == 2)               // ignore all lines with more than two fields (invalid data)
    .Select(x => new { FieldName = x.token[0], Length = x.token[1] })
    .GroupBy(x => x.FieldName)                     // groups lines by FieldName, every group contains it's Key + all anonymous types which belong to this group
    .ToDictionary(xg => xg.Key, xg => string.Join(",", xg.Select(x => x.Length)));

line.Split(new[] { ' ', '\\t' }, StringSplitOptions.RemoveEmptyEntries) will split by space and tabs and ignores all empty spaces. Use GroupBy to ensure that all keys are unique in the dictionary. In the case of duplicate field-names the Length will be joined with comma.


Edit : since you have requested a non-LINQ version, here is it:

Dictionary<string, string> allFieldLengths = new Dictionary<string, string>();
bool headerFound = false;
bool dataFound = false;
foreach (string l in File.ReadLines("path"))
{
    string line = l.Trim();
    if (!headerFound && line.StartsWith("Field Name"))
    {
        headerFound = true;
        // skip this line:
        continue;
    }
    if (!headerFound)
        continue;
    if (!dataFound && line.Length > 0)
        dataFound = true;
    if (!dataFound)
        continue;
    string[] token = line.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
    if (token.Length != 2)
        continue;
    string fieldName = token[0];
    string length = token[1];
    string lengthInDict;
    if (allFieldLengths.TryGetValue(fieldName, out lengthInDict))
        // append this length
        allFieldLengths[fieldName] = lengthInDict + "," + length;
    else
        allFieldLengths.Add(fieldName, length);
}

I like the LINQ version more because it's much more readable and maintainable (imo).

Based on the assumption that the position of the header line is fixed, we may consider actual key-value pairs to start from the 9th line. Then, using the ReadAllLines method to return a String array from the file, we just start processing from index 8 onwards:

  string[] lines = File.ReadAllLines(filepath);
  Dictionary<string,int> pairs = new Dictionary<string,int>();

    for(int i=8;i<lines.Length;i++)
    {
        string[] pair = Regex.Replace(lines[i],"(\\s)+",";").Split(';');
        pairs.Add(pair[0],int.Parse(pair[1]));
    }

This is a skeleton, not accounting for exception handling, but I guess it should get you started.

You can use String.StartsWith() to detect "FieldName". Then String.Split() with a parameter of null to split by whitespace. This will get you your fieldname and length strings.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM