How do you parse multi-level “nodes” in text?

Question

I have a configuration format similar to *.sln format, so take the following as an example:

DCOM Productions Configuration File, Format Version 1.0

BeginSection:Global
    GlobalKeyA = AnswerOne

    .: Stores the global configuration key
    :: for the application. This key is used
    :: to save the current state of the app.
    :: as well as prevent lockups
    GlobalKey3 = AnswerTwo

    .: Secondary Key. See above setting
    GlobalKeyC = AnswerThree

    BeginSection: UpdateSystem
        NestedKeyA = One
        NestedKeyB = Two
        NestedKeyC = { A set of multiline data
                      where we will show how
                      to write a multiline
                      paragraph }
        NestedKeyD = System.Int32, 100
    EndSection
EndSection

BeginSection:Application
    InstallPath = C:\Program Files\DCOM Productions\BitFlex
EndSection

I know that I will need a recursive function probably that takes a segment of text as a parameter so, for example, pass an entire section to it, and recursively parse it that way.

I just can't seem to get my head around how to do this. Each section can potentially have more child sections. It's like an Xml document.. I'm not really asking for code here, just a methodology about how to go about parsing a document like this.

I was thinking about using the tabs (specifies the index) to determine which section I am working with, but this would fail if the document was not tabbed (formatted) correctly. Any better thoughts?

Answer 1

Perhaps you can draw parallel between this format and XML. ie BeginSection <==> "< opening>" EndSection <==> "< /closing>"

Think of it as XML file with many root elements. What's inside BeginSection and EndSection will be your inner xml node with for example NestedKeyA = as node name and "One" as the value.

.: seems to bee a comment, so you can skip it. System.Int32, 100 - can be an attribute and a value of a node

{ A set of multiline data where we will show how to write a multiline paragraph } - you can come out with algorithm to parse this also.

Answer 2

Alrighty, I did it. * phew *

/// <summary>
/// Reads and parses xdf strings
/// </summary>
public sealed class XdfReader {
    /// <summary>
    /// Instantiates a new instance of the DCOMProductions.BitFlex.IO.XdfReader class.
    /// </summary>
    public XdfReader() {
        //
        // TODO: Any constructor code here
        //
    }

    #region Constants

    /// <devdoc>
    /// This regular expression matches against a section beginning. A section may look like the following:
    /// 
    ///     SectionName:Begin
    ///     
    /// Where 'SectionName' is the name of the section, and ':Begin' represents that this is the
    /// opening tag for the section. This allows the parser to differentiate between open and
    /// close tags.
    /// </devdoc>
    private const String SectionBeginRegularExpression = @"[0-9a-zA-Z]*:Begin";

    /// <devdoc>
    /// This regular expression matches against a section ending. A section may look like the following:
    /// 
    ///     SectionName:End
    ///     
    /// Where 'SectionName' is the name of the section, and ':End' represents that this is the
    /// closing tag for the section. This allows the parser to differentiate between open and
    /// close tags.
    /// </devdoc>
    private const String SectionEndRegularExpression = @"[0-9a-zA-Z]*:End";

    /// <devdoc>
    /// This regular expression matches against a key and it's value. A key may look like the following:
    /// 
    ///     KeyName=KeyValue
    ///     KeyName = KeyValue
    ///     KeyName =KeyValue
    ///     KeyName= KeyValue
    ///     KeyName    =       KeyValue
    ///                 
    /// And so on so forth. This regular expression matches against all of these, where the whitespace
    /// former and latter of the assignment operator are optional.
    /// </devdoc>
    private const String KeyRegularExpression = @"[0-9a-zA-Z]*\s*?=\s*?[^\r]*";

    #endregion

    #region Methods

    public void Flush() {
        throw new System.NotImplementedException();
    }

    private String GetSectionName(String xdf) {
        Match sectionMatch = Regex.Match(xdf, SectionBeginRegularExpression);

        if (sectionMatch.Success) {
            String retVal = sectionMatch.Value;
            retVal = retVal.Substring(0, retVal.IndexOf(':'));
            return retVal;
        }
        else {
            throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section.");
        }
    }

    public XdfFile ReadFile(String fileName) {
        throw new System.NotImplementedException();
    }

    public XdfKey ReadKey(String xdf) {
        Match keyMatch = Regex.Match(xdf, KeyRegularExpression);

        if (keyMatch.Success) {
            String name = keyMatch.Value.Substring(0, keyMatch.Value.IndexOf('='));
            name = name.TrimEnd(' ');

            XdfKey retVal = new XdfKey(name);

            String value = keyMatch.Value.Remove(0, keyMatch.Value.IndexOf('=') + 1);
            value = value.TrimStart(' ');

            retVal.Value = value;
            return retVal;
        }
        else {
            throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid key.");
        }
    }

    public XdfSection ReadSection(String xdf) {
        if (ValidateSection(xdf)) {
            String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.RemoveEmptyEntries);
            XdfSection rootSection = new XdfSection(GetSectionName(rows[0])); System.Diagnostics.Debug.WriteLine(rootSection.Name);

            do {
                Match beginMatch = Regex.Match(xdf, SectionBeginRegularExpression);
                beginMatch = beginMatch.NextMatch();

                if (beginMatch.Success) {
                    Match endMatch = Regex.Match(xdf, String.Format("{0}:End", GetSectionName(beginMatch.Value)));

                    if (endMatch.Success) {
                        String sectionXdf = xdf.Substring(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index);
                        xdf = xdf.Remove(beginMatch.Index, (endMatch.Index + endMatch.Length) - beginMatch.Index);

                        XdfSection section = ReadSection(sectionXdf); System.Diagnostics.Debug.WriteLine(section.Name);

                        rootSection.Sections.Add(section);
                    }
                    else {
                        throw new BitFlex.IO.XdfException(String.Format("There is a missing section ending at index {0}.", endMatch.Index));
                    }
                }
                else {
                    break;
                }
            } while (true);

            MatchCollection keyMatches = Regex.Matches(xdf, KeyRegularExpression);

            foreach (Match item in keyMatches) {
                XdfKey key = ReadKey(item.Value);
                rootSection.Keys.Add(key);
            }

            return rootSection;
        }
        else {
            throw new BitFlex.IO.XdfException("The specified xdf did not contain a valid section.");
        }
    }

    private Boolean ValidateSection(String xdf) {
        String[] rows = xdf.Split(new String[] { "\r\n" }, StringSplitOptions.None);

        if (Regex.Match(rows[0], SectionBeginRegularExpression).Success) {
            if (Regex.Match(rows[rows.Length - 1], SectionEndRegularExpression).Success) {
                return true;
            }
            else {
                return false;
            }
        }
        else {
            return false;
        }
    }

    #endregion
}

}

How do you parse multi-level “nodes” in text?

Question

2 answers

solution1
2 ACCPTED 2009-07-25 01:10:47

solution2
0 2009-07-28 20:04:29

How do you parse multi-level “nodes” in text?

Question

2 answers

solution1 2 ACCPTED 2009-07-25 01:10:47

solution2 0 2009-07-28 20:04:29

solution1
2 ACCPTED 2009-07-25 01:10:47

solution2
0 2009-07-28 20:04:29