简体   繁体   English

解析缩进的文本文件

[英]Parsing Indented Text File

I am having some problems figuring this I need to parse this file into a Parent child Relationship. 我在解决这个问题时遇到了一些问题,需要将此文件解析为“父子关系”。 The linkage is by the '--' so the more '--' there is this will indicate the relationship between it and the next line below. 链接由“-”表示,因此更多的“-”表示它与下面的下一行之间的关系。

Item0
--Item1
----Property1
----Property2
----Item2
------Property1
------Property2
----Item3
----Item4
------Property1
------Property2
----Item5
--Item6
--Item7
----Property1
--End
End

I have this class structure 我有这个班级的结构

public class Section
{
    public string text { get; set; }
    public List<Section> children { get; set; }
    public Section parent { get; set; }

    public Section(String text, Section parent)
    {
        this.text = text;
        this.children = new List<Section>();
        this.parent = parent;
    }

    public Section(String text)
    {
        this.text = text;
        this.children = new List<Section>();
        this.parent = null;
    }
}

And I have this recursive loop structure 我有这个递归循环结构

    public void ParseList(Section section, string line)
    {
        if (line.GetLeadingWhitespaceLength() > section.text.GetLeadingWhitespaceLength())
        {

        }
        if (line.GetLeadingWhitespaceLength() < section.text.GetLeadingWhitespaceLength())
        {

        }

        if (line.GetLeadingWhitespaceLength() == section.text.GetLeadingWhitespaceLength())
        {
            if (section.parent != null)
            {
                section.parent.children.Add(new Section(line));
            }
        }
    }

But I cannot connect the dots. 但是我无法连接点。

I realize that this is a late posting and the solution I've provided isn't recursive but this will generate a collection of nodes from your string. 我意识到这是一个较晚的发布,并且我提供的解决方案不是递归的,但这会从您的字符串中生成节点的集合。 To make it recursive, everything you need should be below. 要使其递归,您需要的所有内容都应在下面。

To create a recursive algorithm, you must first determine what your base case is, then it's just a matter of creating a condition to cover every possible clause. 要创建递归算法,您必须首先确定基本情况是什么,然后只需创建一个条件来覆盖所有可能的子句即可。

In the following solution, one example of a base case would be, is the string element null or empty, if so, return the result. 在以下解决方案中,一个基本情况的示例将是字符串元素为null或为空,如果是,则返回结果。 Another option would be, is the previous node depth greater than the current node depth. 另一个选择是,先前的节点深度大于当前的节点深度。 If so, return the root node and assign the current node as the new root. 如果是这样,则返回根节点并将当前节点分配为新的根。 Depending on which solution you choose will determine how you get to the end result. 根据选择的解决方案,将决定如何获得最终结果。 Creating a recursive algorithm to accomplish this task may be overkill since a simple loop and comparison will get you to the same outcome. 创建一个递归算法来完成此任务可能会过大,因为简单的循环和比较将使您获得相同的结果。 Whichever way you choose, this should get you started. 无论选择哪种方式,这都可以帮助您入门。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace SampleTextParsing
{
    class Program
    {
        /// <summary>
        /// String representing the hierarchy to be parsed into objects.
        /// </summary>
        static readonly string fileString = 
        @"Item0
        --Item1
        ----Property1
        ----Property2
        ----Item2
        ------Property1
        ------Property2
        ----Item3
        ----Item4
        ------Property1
        ------Property2
        ----Item5
        --Item6
        --Item7
        ----Property1
        --End
        End";

        static void Main(string[] args)
        {
            // Create a collection of nodes out of the string.
            Queue<BaseNode> nodes = Parse();

            // Display the results to the user.
            Console.WriteLine("Element string\r\n-------------------------------");
            Console.WriteLine(fileString.Replace(' ', '\r'));

            Console.WriteLine("\r\nTotals\r\n-------------------------------");
            DisplayTotals(nodes);

            Console.WriteLine("\r\nHierarchy\r\n-------------------------------");
            while (nodes.Count > 0)
            {
                DisplayRelationships(nodes.Dequeue());
            }

            Console.ReadLine();
        }

        /// <summary>
        /// Parses the hierarchy string into a collection of objects.
        /// </summary>
        /// <returns>A collection of BaseNode objects</returns>
        static Queue<BaseNode> Parse()
        {
            BaseNode root = null;       // Keeps track of the top most parent (Eg. In this case, item0 or End
            BaseNode current = null;    // Keeps track of the node to compare against.
            BaseNode previous = null;   // Keeps track of the previously seen node for comparison.
            Queue<BaseNode> queue = new Queue<BaseNode>();    // Contains a queue of nodes to be returned as the result.

            // Split the string into it's elements by using the carriage return and line feed.  
            // You can add a white-space character as a third delimiter just in case neither of the other two exist in the string. (eg. Inline)
            string[] elements = fileString.Split(new char[] {'\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);

            // Iterate through every string element and create a node object out of it, while setting it's parent relationship to the previous node.
            foreach (var element in elements)
            {
                // Check if a root node has been determined (eg. top most parent).  If not, assign it as the root and set it as the current node.
                if (root == null)
                {
                    root = GetAsElementNode(element);
                    current = root;
                }
                // The root has already been determined and set as the current node.  So now we check to see what it's relationship is to the 
                // previous node. (eg. Child to parent)
                else
                {
                    // Assign the current node as previous, so that we have something to compare against. (eg. Previous to Current)
                    previous = current;

                    // Create a node out of the string element.
                    current = GetAsElementNode(element);

                    // We use the depth (eg. integer representing how deep into the hierarchy we are, where 0 is the root, and 2 is the first child
                    // (This is determined by the number of dashes prefixing the element. eg. Item0 -> --Item1)) to determine the relationship. 
                    // First, lets check to see if the previous node is the parent of the current node.
                    if (current.Depth > previous.Depth)
                    {
                        // It is, so assign the previous node as being the parent of the current node.
                        current.Parent = previous;
                    }
                    // The previous node is not the parent, so now lets check to see if the previous node is a sibling of the current node. 
                    // (eg. Do they share the same parent?)
                    else if (current.Depth == previous.Depth)
                    {
                        // They do, so get the previous node's parent, and assign it as the current node's parent as well.
                        current.Parent = previous.Parent;
                    }
                    // The current node is not the parent (eg. lower hierarchy), nor is it the sibling (eg. same hierarchy) of the previous node.  
                    // So it must be higher in the hierarchy. (eg. It's depth is less than the previous node's depth.)
                    else
                    {
                        // So now we must determine what the previous sibling node was and assign it as the current node's parent temporarily
                        BaseNode previousSibling = queue.FirstOrDefault(sibling => sibling.Depth == current.Depth);
                        current.Parent = previousSibling;

                        // The only time that the pervious sibling should be null is if the sibling is a root node. (eg. Item0 or End)
                        if (previousSibling == null)
                        {
                            current.Parent = null;
                        }
                        // The previous sibling has a parent, so we will the parent of the current node to match it's sibling.
                        else
                        {
                            current.Parent = previousSibling.Parent;
                        }
                    }
                }

                // We now add the node to the queue that will be returned as the result.
                queue.Enqueue(current);
            }

            return queue;
        }

        /// <summary>
        /// Simply outputs to console, the name of the node and it's relationship to the previous node if any.
        /// </summary>
        /// <param name="node">The node to output the name of.</param>
        private static void DisplayRelationships(BaseNode node)
        {
            string output = string.Empty;
            if (node.Parent == null)
            {
                output = string.Format("{0} is a root node.", node.Name);
            }
            else
            {
                output = string.Format("{0} is a child of {1}.", node.Name, node.Parent.Name);
            }

            Console.WriteLine(output);
        }

        /// <summary>
        /// Displays the total counts of each relationship.  The numbers appear slightly off because the clauses are not 
        /// taking into account that a root node has no parent but can have children.  So Item0 and End are excluded from the count
        /// but included in the root count.  The values are right otherwise.
        /// </summary>
        /// <param name="nodes">A queue of nodes to iterate through.</param>
        private static void DisplayTotals(Queue<BaseNode> nodes)
        {
            var totalRoot = nodes.Where(node => node.Parent == null).Count();
            var totalChildren = nodes.Where(node => node.Parent != null).Count();
            var totalChildless = nodes
                .Where(node => node.Parent != null)
                .Join(
                    nodes.Where(
                    node => (node.Parent != null)), 
                        parent => parent.Name, 
                        child => child.Parent.Name, 
                        (parent, child) => new { child })
                        .Count();


            Console.WriteLine("{0} root nodes.", totalRoot);
            Console.WriteLine("{0} child nodes.", totalChildren);
            Console.WriteLine("{0} nodes without children.", totalChildless);
            Console.WriteLine("{0} parent nodes.", totalChildren - totalChildless);
        }

        /// <summary>
        /// Creates a node object from it's string equivalent.
        /// </summary>
        /// <param name="element">The parsed string element from the hierarchy string.</param>
        /// <returns></returns>
        static BaseNode GetAsElementNode(string element)
        {
            // Use some regex to parse the starting portion of the string.  You can also use substring to accomplish the same thing.
            string elementName = Regex.Match(element, "[a-zA-Z0-9]+").Value;
            string link = Regex.Match(element, "-+").Value;

            // Return a new node with an element name and depth initialized.
            return new Node(elementName, link.Length);
        }
    }

    /// <summary>
    /// A node object which inherits from BaseNode.
    /// </summary>
    public class Node : BaseNode
    {
        public Node()
        {
        }

        /// <summary>
        /// Overloaded constructor which accepts a string element and the depth
        /// </summary>
        /// <param name="elementName">The element as a string</param>
        /// <param name="depth">The depth of the element determined by the number of dashes prefixing the element string.</param>
        public Node(string elementName, int depth)
            : base(elementName, depth)
        {
        }
    }

    /// <summary>
    /// A base node which implements the INode interface. 
    /// </summary>
    public abstract class BaseNode : INode
    {
        public string Name { get; set; }        // The name of the node parsed from the string element.
        public int Depth { get; set; }          // The depth in the hierarchy determined by the number of dashes (eg. Item0 -> --Item1)
        public BaseNode Parent { get; set; }    // The parent of this node.

        public BaseNode()
        {
        }

        /// <summary>
        /// Overloaded constructor which accepts a string element and the depth.
        /// </summary>
        /// <param name="elementName">The element as a string.</param>
        /// <param name="depth">The depth of the element determined by the number of dashes prefixing the element string.</param>
        public BaseNode(string elementName, int depth)
            : this()
        {
            this.Name = elementName;
            this.Depth = depth;
        }
    }

    /// <summary>
    /// The interface that is implemented by the BaseNode base class.
    /// (For this scenario, this is a bit overkill but I figured, if I'm going to propose a solution, to do it right!)
    /// </summary>
    public interface INode
    {
        string Name { get; set; }        // The name of the node parsed from the string element.
        int Depth { get; set; }          // The depth in the hierarchy determined by the number of dashes (eg. Item0 -> --Item1)
        BaseNode Parent { get; set; }    // The parent of this node.
    }
}

The output would be the following: 输出如下:

Element string
-------------------------------
Item0
--Item1
----Property1
----Property2
----Item2
------Property1
------Property2
----Item3
----Item4
------Property1
------Property2
----Item5
--Item6
--Item7
----Property1
--End
End

Totals
-------------------------------
2 root nodes.
15 child nodes.
11 nodes without children.
4 parent nodes.

Hierarchy
-------------------------------
Item0 is a root node.
Item1 is a child of Item0.
Property1 is a child of Item1.
Property2 is a child of Item1.
Item2 is a child of Item1.
Property1 is a child of Item2.
Property2 is a child of Item2.
Item3 is a child of Item1.
Item4 is a child of Item1.
Property1 is a child of Item4.
Property2 is a child of Item4.
Item5 is a child of Item1.
Item6 is a child of Item0.
Item7 is a child of Item0.
Property1 is a child of Item7.
End is a child of Item0.
End is a root node.

The first thing that comes to mind would be to convert it to XML which would be easier to work with. 首先想到的是将其转换为更易于使用的XML。 I don't think recursion is in order here (and I don't see any recursion in your ParseList ). 我认为此处的递归顺序不正确(并且您的ParseList没有任何递归)。 I would read each line in order, start with an opening element <Item0> and add it to a Stack. 我将按顺序阅读每一行,从一个开始元素<Item0> ,并将其添加到堆栈中。 Read the next line, if it has the same number of dashes as the top of the Stack previous line , then close the element, else keep going. 阅读下一行,如果它的破折号与Stack 上一行 的顶部相同,则关闭该元素,否则继续进行。 When you do find the same number of dashes, pop the Stack and close the element. 当您找到相同数量的破折号时,请弹出堆栈并关闭该元素。 Something like that anyway.... 无论如何...。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM