简体   繁体   中英

Best way to parse complex configuration file

I need to parse a complex configuration file using Python. I should note that the format of these files is something I cannot change, but rather have to live with.

The basic structure of the file is this:

Keyword1
"value1"
thisisirrelevantforkeyword1
Keyword2
"first", "second", "third"
1, 2, 3

Keyword3
2, "whatever"
firstparam, 1
secondparam, 2
again_not_relevant

Ultimately, the output of this should be a JSON string.

Let me explain:

  • Each keyword has its own rules.
  • The values are in the line(s) following the keyword.
  • For example, Keyword1 has one value, which is the string value1 . The line following value1 is irrelevant.
  • For example, Keyword2 has two parameters, the first one being a list of strings, the second one a list of integers.
  • For example, Keyword3 has a variable number of parameters, being indicated by the first integer in the first line after Keyword3 . So the parameters relevant for Keyword3 are the list 2, "whatever" , and the two lists in the two following lines.

There is a fixed set of keywords with its own rules. Of course, I could in principle hard-code the whole thing, which would lead to a lot of code duplication. Plus, this would be quite inflexible regarding new keywords, or changing rules for single keywords.

I'd rather prepare a CSV file containing all keywords, with the rule how it is defined, and then use this as input for a more generic parser function.

So my question is: - How do I specify the rules in a simple way? I'm sure there's standards for this, but have absolutely no idea where to start looking. - How could I then use this grammar to parse the file and generate my JSON?

I know this is a quite specific, special, and complex thing; so I'd already be thankful for pointers in the right direction, as I feel a bit lost and am unsure where to start looking.

I think you could have some classes for your options which have really special rules.

Something like that :

class OptionBase(object):
    def __init__(self, name, **options):
        self.name = name
        self.raw_config_lines = []

    def parse_line(self, line):
        line = line.strip()
        if line:
            self.raw_config_lines.append(line)

    def get_config(self):
        raise Exception('Not Implemented')


class SimpleOption(OptionBase):
    def __init__(self, name, **options):
        super(SimpleOption, self).__init__(name, **options)
        self.expected_format = options.get('expected_format', str)

    def parse_line(self, line):
        if len(self.raw_config_lines):
            raise Exception('SimpleOption can only have one value')
        else:
            super(SimpleOption, self).parse_line(line)

    def get_config(self):
        return None if not self.raw_config_lines else self.expected_format(self.raw_config_lines[0])


class SomeComplexOption(OptionBase):
    def parse_line(self, line):
        #some special code which verify number of lines, type of args etc.

    def get_config(self):
        #some code to transform raw_line in another format

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM