使用python以下格式解析文件的最佳方法（防錯/萬無一失）是什么？

Question

########################################
# some comment
# other comment
########################################

block1 {
    value=data
    some_value=some other kind of data
    othervalue=032423432
    }

block2 {
    value=data
    some_value=some other kind of data
    othervalue=032423432
    }

Answer 1

最好的方法是使用現有格式，如JSON。

這是您的格式的示例解析器：

from lepl import (AnyBut, Digit, Drop, Eos, Integer, Letter,
                  NON_GREEDY, Regexp, Space, Separator, Word)

# EBNF
# name = ( letter | "_" ) , { letter | "_" | digit } ;
name = Word(Letter() | '_',
            Letter() | '_' | Digit())
# words = word , space+ , word , { space+ , word } ;
# two or more space-separated words (non-greedy to allow comment at the end)
words = Word()[2::NON_GREEDY, ~Space()[1:]] > list
# value = integer | word | words  ;
value = (Integer() >> int) | Word() | words
# comment = "#" , { all characters - "\n" } , ( "\n" | EOF ) ;
comment = '#' & AnyBut('\n')[:] & ('\n' | Eos())

with Separator(~Regexp(r'\s*')):
    # statement = name , "=" , value ;
    statement = name & Drop('=') & value > tuple
    # suite     = "{" , { comment | statement } , "}" ;
    suite     = Drop('{') & (~comment | statement)[:] & Drop('}') > dict
    # block     = name , suite ;
    block     = name & suite > tuple
    # config    = { comment | block } ;
    config    = (~comment | block)[:] & Eos() > dict

from pprint import pprint

pprint(config.parse(open('input.cfg').read()))

輸出：

[{'block1': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'},
  'block2': {'othervalue': 32423432,
             'some_value': ['some', 'other', 'kind', 'of', 'data'],
             'value': 'data'}}]

Answer 2

好吧，數據看起來非常規律。 所以你可以做這樣的事情（未經測試）：

class Block(object):
    def __init__(self, name):
        self.name = name

infile = open(...)  # insert filename here
current = None
blocks = []

for line in infile:
    if line.lstrip().startswith('#'):
        continue
    elif line.rstrip().endswith('{'):
        current = Block(line.split()[0])
    elif '=' in line:
        attr, value = line.strip().split('=')
        try:
            value = int(value)
        except ValueError:
            pass
        setattr(current, attr, value)
    elif line.rstrip().endswith('}'):
        blocks.append(current)

結果將是Block實例列表，其中block.name將是名稱（ 'block1' ， 'block2'等），其他屬性對應於數據中的鍵。 因此， blocks[0].value將是'data'等。請注意，這僅將字符串和整數作為值處理。

（如果您的密鑰可以包含'name'，那么這里有一個明顯的錯誤。您可能希望將self.name更改為self._name或者如果可能發生這種情況的話。

HTH！

Answer 3

如果你不是真的意味着解析，而是文本處理和輸入數據真的那么規律，那么請使用John的解決方案。 如果你真的需要一些解析（就像你得到的數據有一些更復雜的規則），那么根據你需要解析的數據量，我會選擇pyparsing或simpleparse 。 我試過他們兩個，但實際上pyparsing對我來說太慢了。

Answer 4

你可能會研究像pyparsing這樣的東西。

Answer 5

Grako（用於語法編譯器）允許將輸入格式規范（語法）與其解釋（語義）分開。 這是Grako各種EBNF中輸入格式的語法：

(* a file contains zero or more blocks *)
file = {block} $;
(* a named block has at least one assignment statement *)
block = name '{' {assignment}+ '}';
assignment = name '=' value NEWLINE;
name = /[a-z][a-z0-9_]*/;
value = integer | string;
NEWLINE = /\n/;
integer = /[0-9]+/;
(* string value is everything until the next newline *)
string = /[^\n]+/;

要安裝grako ，請運行pip install grako 。 從語法生成PEG解析器：

$ grako -o config_parser.py Config.ebnf

使用生成的config_parser模塊將stdin轉換為json：

#!/usr/bin/env python
import json
import string
import sys
from config_parser import ConfigParser

class Semantics(object):
    def file(self, ast):
        # file = {block} $
        # all blocks should have unique names within the file
        return dict(ast)
    def block(self, ast):
        # block = name '{' {assignment}+ '}'
        # all assignment statements should use unique names
        return ast[0], dict(ast[2])
    def assignment(self, ast):
        # assignment = name '=' value NEWLINE
        # value = integer | string
        return ast[0], ast[2] # name, value
    def integer(self, ast):
        return int(ast)
    def string(self, ast):
        return ast.strip() # remove leading/trailing whitespace

parser = ConfigParser(whitespace='\t\n\v\f\r ', eol_comments_re="#.*?$")
ast = parser.parse(sys.stdin.read(), rule_name='file', semantics=Semantics())
json.dump(ast, sys.stdout, indent=2, sort_keys=True)

產量

{
  "block1": {
    "othervalue": 32423432,
    "some_value": "some other kind of data",
    "value": "data"
  },
  "block2": {
    "othervalue": 32423432,
    "some_value": "some other kind of data",
    "value": "data"
  }
}

使用python以下格式解析文件的最佳方法（防錯/萬無一失）是什么？

問題描述

5 個解決方案

解決方案1
6 已采納 2009-10-31 16:05:47

解決方案2
4 2009-01-29 21:34:23

解決方案3
3 2009-01-29 22:24:17

解決方案4
2 2009-01-29 22:15:12

解決方案5
1 2014-10-05 15:23:40

產量

使用python以下格式解析文件的最佳方法（防錯/萬無一失）是什么？

問題描述

5 個解決方案

解決方案1 6 已采納 2009-10-31 16:05:47

解決方案2 4 2009-01-29 21:34:23

解決方案3 3 2009-01-29 22:24:17

解決方案4 2 2009-01-29 22:15:12

解決方案5 1 2014-10-05 15:23:40

產量

解決方案1
6 已采納 2009-10-31 16:05:47

解決方案2
4 2009-01-29 21:34:23

解決方案3
3 2009-01-29 22:24:17

解決方案4
2 2009-01-29 22:15:12

解決方案5
1 2014-10-05 15:23:40