使用 Python SLY 读取模式简单 DSL

Question

I try to create simple DSL with Python SLY .我尝试使用 Python SLY创建简单的 DSL。 But, I can't get the result as I expected because the parser can't read it properly.但是，我无法得到预期的结果，因为解析器无法正确读取它。 So here the code :所以这里的代码：

Lexer词法分析器

from sly import Lexer

class ConfigLexer(Lexer):
  tokens = { ANIMALS, BLOOD, SKIN, BREATHE, ANIMAL_NAME, VALUE, ASSIGN }

  ignore    = " \t\r"
  ignore_newline = r'\n+'

  ANIMALS       = "ANIMALS"
  BLOOD         = "BLOOD"
  SKIN          = "SKIN"
  BREATHE       = "BREATHE"
  ANIMAL_NAME   = r'\{[a-zA-Z_][a-zA-Z0-9_]*\}'
  VALUE         = r'[a-zA-Z_][a-zA-Z0-9_,.: ]*'
  ASSIGN        = r'\='

Parser解析器

from sly import Parser

class ConfigParser(Parser):
    tokens = ConfigLexer.tokens

    def __init__(self):
        self.config = dict()
        self.dict_attribute = dict()
        self.animal_name = ""
    
    @_("ANIMALS animaldetails")
    def animals(self, p):
        pass
    
    @_("ANIMAL_NAME animalnamedetails")
    def animaldetails(self, p):
        self.animal_name = p.ANIMAL_NAME.replace("{", "").replace("}","")
        if self.animal_name not in self.config:
            self.config[self.animal_name] = self.dict_attribute
    
    @_("BLOOD ASSIGN VALUE")
    def animalnamedetails(self, p):
        if p.BLOOD not in self.dict_attribute:
            self.dict_attribute[p.BLOOD] = p.VALUE
    
    @_("SKIN ASSIGN VALUE")
    def animalnamedetails(self, p):
        if p.SKIN not in self.dict_attribute:
            self.dict_attribute[p.SKIN] = p.VALUE
    
    @_("BREATHE ASSIGN VALUE")
    def animalnamedetails(self, p):
        if p.BREATHE not in self.dict_attribute:
            self.dict_attribute[p.BREATHE] = p.VALUE
    
    def get_config(self):
        return self.config

but when I run it.但是当我运行它时。

import json
import ConfigLexer
import ConfigParser

if __name__ == '__main__':
    lexer = ConfigLexer()
    parser = ConfigParser()
    long_string = """ANIMALS
{MAMMALS}
BLOOD = WARM
SKIN = FUR
BREATHE = LUNGS
{FISH}
BLOOD = COLD
SKIN = SCALY
BREATHE = GILLS"""
    result = parser.parse(lexer.tokenize(long_string))
    cfg = parser.get_config()
    data_json = json.dumps(cfg, indent=3)
    print(data_json)

as I expected, the result would be like this.正如我所料，结果将是这样的。

data_json = {
'MAMMALS': {'BLOOD': 'WAMR': 'SKIN': 'FUR OR HAIR', 'BREATHE': 'LUNGS'},
'FISH': {'BLOOD': 'COLD', 'SKIN': 'SCALY', 'BREATHE': 'GILLS'}
}

but I only get something like this.但我只得到这样的东西。

data_json = {
   'MAMMALS': {
      'BLOOD': 'WARM'
   }
}

result of executing :执行结果：

sly: Syntax error at line 1, token=SKIN
{
   "MAMMALS": {
      "BLOOD": "WARM"
   }
}

I guess I have to edit the Parser, but I can't think how, and would appreciate any pointers you can give me.我想我必须编辑解析器，但我想不出怎么做，并且很感激你能给我的任何指示。

Answer 1

You have non-terminals named animals , animaldetails , and animalnameddetails , in plural, which would normally lead one to expect that the grammar for each of them would allow a sequence of things.你有非终结符命名为animals 、 animaldetails和animalnameddetails ，复数形式，这通常会导致人们期望它们每个的语法都允许一系列的事情。 But they don't.但他们没有。 Each of these categories parses a single thing.这些类别中的每一个都解析一个事物。 You've implemented the singular, and although it's named in plural, there's no repetition.您已经实现了单数，尽管它以复数形式命名，但没有重复。

That this was not your intent is evident from your example, which does have multiple sections and multiple attributes in each section.从您的示例中可以明显看出这不是您的意图，该示例在每个部分中确实有多个部分和多个属性。 But since the grammar only describes one attribute and value, the second one is a syntax error.但是由于语法只描述了一个属性和值，第二个是语法错误。

Traditionally, grammars will implement sequences with pairs of non-terminals;传统上，语法将使用非终结符对来实现序列； a singular non-terminal which describes a single thing, and a plural non-terminal which describes how lists are formed (simple concatenation, or separated by punctuation).描述单个事物的单数非终结符和描述列表如何形成的复数非终结符（简单连接或用标点符号分隔）。 So you might have:所以你可能有：

file: sections
sections: empty
        | sections section
section: category attributes
settings: empty
        | settings setting
setting: attribute '=' value

You probably should also look fora description of how to manage semantic values.您可能还应该查找有关如何管理语义值的描述。 Storing intermediate results in class members, as you do, works only when the grammar doesn't allow nesting, which is relatively unusual.正如您所做的那样，将中间结果存储在类成员中仅在语法不允许嵌套时才有效，这是相对不寻常的。 It's a technique which will almost always get you into trouble.这是一种几乎总是会给你带来麻烦的技术。 The semantic actions of each production should manage these values:每个产生式的语义动作应该管理这些值：

A singular object syntax should create and return a representation of the object.单一对象语法应该创建并返回对象的表示。
A plural→empty production should create and return a representation of an empty collection.复数→空产生式应该创建并返回空集合的表示。
Similarly, a production of the form things→ things thing should append the new thing to the aggregate of things, and then return the augmented aggregate.类似地， things→ things thing形式的产生式应该将新事物附加到事物的聚合中，然后返回增强的聚合。

Answer 2

Cheers...干杯...

from json import dumps

from sly import Lexer, Parser

class MyLexer(Lexer):
    tokens = {ANIMALS, ANIMAL_NAME, BLOOD, SKIN, BREATHE, ASSIGN, ASSIGN_VALUE}
    ignore = ' \t'

    ANIMALS = r'ANIMALS'
    BLOOD = r'BLOOD'
    SKIN = r'SKIN'
    BREATHE = r'BREATHE'

    ASSIGN = r'='
    ASSIGN_VALUE = r'[a-zA-Z_][a-zA-Z0-9_]*'

    @_(r'\{[a-zA-Z_][a-zA-Z0-9_]*\}')
    def ANIMAL_NAME(self, t):
        t.value = str(t.value).lstrip('{').rstrip('}')
        return t

    @_(r'\n+')
    def NEWLINE(self, t):
        self.lineno += t.value.count('\n')


class MyParser(Parser):
    tokens = MyLexer.tokens

    def __init__(self):
        self.__config = {}

    def __del__(self):
        print(dumps(self.__config, indent=4))

    @_('ANIMALS animal animal')
    def animals(self, p):
        pass

    @_('ANIMAL_NAME assignment assignment assignment')
    def animal(self, p):
        if p.ANIMAL_NAME not in self.__config:
            self.__config[p.ANIMAL_NAME] = {}
        animal_name, *assignments = p._slice
        for assignment in assignments:
            assignment_key, assignment_value = assignment.value
            self.__config[p.ANIMAL_NAME][assignment_key] = assignment_value

    @_('key ASSIGN ASSIGN_VALUE')
    def assignment(self, p):
        return p.key, p.ASSIGN_VALUE

    @_('BLOOD', 'SKIN', 'BREATHE')
    def key(self, p):
        return p[0]


if __name__ == '__main__':
    lexer = MyLexer()
    parser = MyParser()
    text = '''ANIMALS
{MAMMALS}
BLOOD = WARM
SKIN = FUR
BREATHE = LUNGS
{FISH}
BLOOD = COLD
SKIN = SCALY
BREATHE = GILLS
'''
    parser.parse(lexer.tokenize(text))

Output:输出：

{
    "MAMMALS": {
        "BLOOD": "WARM",
        "SKIN": "FUR",
        "BREATHE": "LUNGS"
    },
    "FISH": {
        "BLOOD": "COLD",
        "SKIN": "SCALY",
        "BREATHE": "GILLS"
    }
}

使用 Python SLY 读取模式简单 DSL

问题描述

2 个解决方案

解决方案1
0 2022-06-13 06:12:49

解决方案2
-1 2022-06-13 09:13:44

使用 Python SLY 读取模式简单 DSL

问题描述

2 个解决方案

解决方案1 0 2022-06-13 06:12:49

解决方案2 -1 2022-06-13 09:13:44

解决方案1
0 2022-06-13 06:12:49

解决方案2
-1 2022-06-13 09:13:44