简体   繁体   English

如何从Python中的字符串解析树?

[英]How to parse a tree from a string in Python?

I have formatted all my university notes as follows: 我将所有大学笔记的格式设置如下:

CourseName: {
    Part 1: {
        I.I - Intro: {
            Topic1: {
                descr1;
                descr2: {
                    2.a;
                    2.b;
                    2.c.
                };
                descr3.
            };
            Topic2: {
                descr: {
                    example.
                }.
            }.
        };
        I.II - NextChapter: {
            Topic3: {
                whatever.
            }.
        }.
    };
    Part 2: {
        II.I - FinalChapter: {
            content.
        }.
    }.
}

I'd like to structure them into a Tree data structure and I've tried doing so, both recursively and iteratively, in the past hours, doing many researches online, but none of my attempts at it is working. 我想将它们构造成Tree数据结构 ,并且在过去的几个小时中,我尝试了递归和迭代的方式,在网上进行了许多研究,但我的尝试均无济于事。

I've already implemented a Node class (with self.__value and a list self.__children and all the useful methods you would expect from it) as well as a Tree class (with self.__nodes as a dictionary and other utility methods), so feel free to use methods such as add_node or add_child in any form of your liking in your answers. 我已经实现了Node类 (具有self.__value和列表self.__children以及您可能期望的所有有用方法)以及Tree类 (具有self.__nodes作为字典和其他实用程序方法),因此,您可以根据自己的喜好随意使用诸如add_nodeadd_child方法。

What I'm struggling with is to understand how to structure the function def parseTree(s, l) - that ideally takes as inputs a string s (my notes) and a list l establishing the delimiters ie [":{", ";", "}."] or ["{","}"] or similar - and returns a tree object, with each node having as value the text preceding :{ and a list of children (if any) separated by ; 我正在努力了解如何构造函数def parseTree(s, l) -理想地将字符串s (我的笔记)和建立定界符的列表l作为输入,即[":{", ";", "}."]["{","}"]或类似的内容,并返回一个树对象,每个节点的值以:{开头的文本和一个由( ; )分隔的子代列表(如果有) in the text. 在文本中。

Any suggestion? 有什么建议吗?

This is actually almost syntactically valid YAML. 实际上,这在语法上几乎是有效的YAML。 A simple substitution will make it valid: 一个简单的替换将使其有效:

data = data.replace(';', ',').replace('.', '')
parsed = yaml.load(data)

Assuming your data is stored in a file, you can build a simple class to parse the structure into a dictionary. 假设您的数据存储在文件中,则可以构建一个简单的类以将结构解析为字典。 You can recursively traverse the data by creating a new Notes object for each key found: 您可以通过为找到的每个键创建一个新的Notes对象来递归遍历数据:

file_data = filter(None, [i.strip('\n') for i in open('filename.txt')])
import re
class Notes:
   def __init__(self, token_data):
     self.token_data = token_data
     self.current_dict = {}
     self.current_vals = []
     self.parse()
   def parse(self):
     while True:
       start = next(self.token_data, None)
       if not start or "}" in start:
         break
       if start.endswith('{'):
          note = Notes(self.token_data)
          final_result = filter(lambda x:x, note.current_vals + [note.current_dict]) if note.current_vals else note.current_dict
          self.current_dict[re.findall('[\w\s\-\.]+', re.sub('^\s+', '', start))[0]] = final_result[0] if isinstance(final_result, list) and len(final_result) == 1 else final_result
          self.token_data = note.token_data
       else:
          self.current_vals.append(re.sub('^\s+', '', start))


course_notes = Notes(iter(file_data)).current_dict

Output: 输出:

{'CourseName': 
    {'Part 1': 
      {'I.I - Intro': 
         {'Topic1': ['descr1;',
                    'descr3.',
             {'descr2': ['2.a;',
                         '2.b;',
                          '2.c.']
                }
               ],
         'Topic2': {'descr': 'example.'}
                 },
                  'I.II - NextChapter': 
               {'Topic3': 'whatever.'}
             },
       'Part 2':{'II.I - FinalChapter': 'content.'}
     } 
   }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM