简体   繁体   English

在python中解析结构化文本文件

[英]Parsing structured text file in python

I need to parse text files similar to the one below with Python, build an hierarchical object structure of the data and then process it. 我需要使用Python解析类似于下面的文本文件,构建数据的分层对象结构,然后处理它。 This is very similar to what we can do with xml.etree.ElementTree and other XML parsers. 这与我们可以使用xml.etree.ElementTree和其他XML解析器非常相似。

The syntax of these files is however not XML and I'm wondering what is the best way to implement such a parser: if trying to subclass one XML parser (which one?) and customize its behavior for tag recognition, write a custom parser, etc. 然而,这些文件的语法不是XML,我想知道实现这样一个解析器的最佳方法是什么:如果尝试子类化一个XML解析器(哪个?)并自定义其行为以进行标记识别,请编写自定义解析器,等等

{NETLIST topblock
{VERSION 2 0 0}

{CELL topblock
    {PORT gearshift_h vpsf vphreg pwron_h vinp vref_out vcntrl_out gd meas_vref 
      vb vout meas_vcntrl reset_h vinm }
    {INST XI21/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN vpsf=GATE vpsf=BULK }}
    {INST XI21/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_h=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI20/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_hn=DRN vpsf=GATE vpsf=BULK }}
    {INST XI20/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gs_hn=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI19/Mdummy1=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC net514=DRN vpsf=GATE vpsf=BULK }}
    {INST XI19/Mdummy2=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN net514=SRC gd=DRN gd=GATE gd=BULK }}
    {INST XI21/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
    {PIN gd=SRC gs_h=DRN gs_hn=GATE gd=BULK }}
    {INST XI21/MP0=pch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/pch_18_mac" Length=0.152 NFIN=8 }
    {PIN vpsf=SRC gs_h=DRN gs_hn=GATE vpsf=BULK }}
    {INST XI20/MN0=nch_18_mac {TYPE MOS} {PROP n="sctg_inv1x/nch_18_mac" Length=0.152 NFIN=5 }
...
}
}

First of all, you should check if there is already a parser available for your file format. 首先,您应该检查是否已有可用于您的文件格式的解析器。 Apparently there is: Python-based Verilog Parser (currently Netlist only) 显然有: 基于Python的Verilog Parser(目前仅限Netlist)

If you can't find anything suitable, you can build a parser using one of plethora available libraries for building parsers, for example pyparsing. 如果找不到合适的东西,可以使用一个过多的可用库构建解析器来构建解析器,例如pyparsing。 Subclassing XML parsers doesn't seem to be a good idea. 子类化XML解析器似乎不是一个好主意。

What the others said in the comments: use an existing parser. 其他人在评论中说:使用现有的解析器。 If none exists, roll your own, but use a parser library. 如果不存在,请自行滚动,但使用解析器库。 Here eg with Parcon : 在这里,例如与Parcon

from pprint import pprint
from parcon import (Forward, SignificantLiteral, Word, alphanum_chars, Exact,
                    ZeroOrMore, CharNotIn, concat, OneOrMore)

block = Forward()
hyphen = SignificantLiteral('"')
word = Word(alphanum_chars + '/_.)')
value = word | Exact(hyphen + ZeroOrMore(CharNotIn('"')) + hyphen)[concat]
pair = word + '=' + value
flag = word
attribute = pair | flag | block
head = word
body = ZeroOrMore(attribute)
block << '{' + head + body  + '}'
blocks = OneOrMore(block)

with open('<your file name>.txt') as infile:
    pprint(blocks.parse_string(infile.read()))

Result: 结果:

[('NETLIST',
  ['topblock',
   ('VERSION', ['2', '0', '0']),
   ('CELL',
    ['topblock',
     ('PORT',
      ['gearshift_h',
       'vpsf',
       'vphreg',
       'pwron_h',
       'vinp',
       'vref_out',
       'vcntrl_out',
       'gd',
       'meas_vref',
       'vb',
       'vout',
       'meas_vcntrl',
       'reset_h',
       'vinm']),
     ('INST',
      [('XI21/Mdummy1', 'pch_18_mac'),
       ('TYPE', ['MOS']),
       ('PROP',
        [('n', '"sctg_inv1x/pch_18_mac"'),
         ('Length', '0.152'),
         ('NFIN', '8')]),
       ('PIN',
        [('vpsf', 'SRC'),
         ('gs_h', 'DRN'),
         ('vpsf', 'GATE'),
         ('vpsf', 'BULK')])]),
     ('INST',
        ...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM