简体   繁体   中英

Parsing tree in C#

I have a [textual] tree like this:

+---step-1
|   +---step_2
|   |   +---step3
|   |   \---step4
|   +---step_2.1
|   \---step_2.2
+---step1.2

Tree2

+---step-1
|   \---step_2
|   |   +---step3
|   |   \---step4
+---step1.2

This is just a small example, tree can be deeper and with more children and etc..

Right now I'm doing this:

for (int i = 0; i < cmdOutList.Count; i++)
{
    string s = cmdOutList[i];
    String value = Regex.Match(s, @"(?<=\---).*").Value;
    value = value.Replace("\r", "");
    if (s[1].ToString() == "-")
    {
        DirectoryNode p = new DirectoryNode { Name = value };
        //p.AddChild(f);
        directoryList.Add(p);
    }
    else
    {
        DirectoryNode f = new DirectoryNode { Name = value };
        directoryList[i - 1].AddChild(f);
        directoryList.Add(f);
    }
}

But this doesn't handle the "step_2.1" and "step_2.2"

I think I'm doing this totally wrong, maybe someone can help me out with this.

EDIT :

Here is the DirectoryNode class to make that a bit more clear..

public class DirectoryNode
{
    public DirectoryNode()
    {
        this.Children = new List<DirectoryNode>();
    }
    public DirectoryNode ParentObject { get; set; }
    public string Name;
    public List<DirectoryNode> Children { get; set; }

    public void AddChild(DirectoryNode child)
    {
        child.ParentObject = this;
        this.Children.Add(child);
    }
}

If your text is that simple (just either +--- or \\--- preceded by a series of | ), then a regex might be more than you need (and what's tripping you up).

DirectoryNode currentParent = null;
DirectoryNode current = null;
int lastStartIndex = 0;

foreach(string temp in cmdOutList)
{
    string line = temp;

    int startIndex = Math.Max(line.IndexOf("+"), line.IndexOf(@"\");

    line = line.Substring(startIndex);

    if(startIndex > lastStartIndex) 
    {
        currentParent = current;
    }
    else if(startIndex < lastStartIndex)
    {
        for(int i = 0; i < (lastStartIndex - startIndex) / 4; i++)
        {
            if(currentParent == null) break;

            currentParent = currentParent.ParentObject;
        }
    }

    lastStartIndex = startIndex;

    current = new DirectoryNode() { Name = line.Substring(4) };

    if(currentParent != null)
    {
        currentParent.AddChild(current);
    }
    else
    {
        directoryList.Add(current);
    }
}

Regex definitely looks unnecessary here, since the symbols in your markup language (that's what it is, after all) are both static and few. That is: Although the label names may vary, the tokens you need to look for when trying to parse them into relevant pieces will never be anything other than +--- , \\--- , and . .

From a question I answered yesterday : "Regexes are extremely useful for describing a whole class of needles in a largely unknown haystack, but they're not the right tool for input that's in a very static format."

String manipulation is what you want for parsing this, especially since you're dealing with a recursive markup language, which can't be fully understood by regex anyway . I'd also suggest creating a tree-type data structure to store the data (which, surprisingly, doesn't seem to be included in the framework unless they added it after 2.0).

As an aside, your regex above seems to have an unnecessary \\ in it, but that doesn't matter in most regex flavors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM