简体   繁体   English

如何使用Python将格式化的文件解析为变量?

[英]How can I parse a formatted file into variables using Python?

I have a pre-formatted text file with some variables in it, like this: 我有一个预先格式化的文本文件,里面有一些变量,如下所示:

header one
   name = "this is my name"
   last_name = "this is my last name"
   addr = "somewhere"
   addr_no = 35
header
header two
   first_var = 1.002E-3
   second_var = -2.002E-8
header 

As you can see, each score starts with the string header followed by the name of the scope (one, two, etc.). 如您所见,每个分数都以字符串header开头,后跟范围名称(一,二等)。

I can't figure out how to programmatically parse those options using Python so that they would be accesible to my script in this manner: 我无法弄清楚如何使用Python以编程方式解析这些选项,以便以这种方式访问​​我的脚本:

one.name = "this is my name"
one.last_name = "this is my last name"
two.first_var = 1.002E-3

Can anyone point me to a tutorial or a library or to a specific part of the docs that would help me achieve my goal? 任何人都可以指向我的教程或图书馆或文档的特定部分,这将有助于我实现我的目标吗?

I'd parse that with a generator, yielding sections as you parse the file. 我用生成器解析它,在解析文件时产生部分。 ast.literal_eval() takes care of interpreting the value as a Python literal: ast.literal_eval()负责将值解释为Python文字:

import ast

def load_sections(filename):
    with open(filename, 'r') as infile:
        for line in infile:
            if not line.startswith('header'):
                continue  # skip to the next line until we find a header

            sectionname = line.split(None, 1)[-1].strip()
            section = {}
            for line in infile:
                if line.startswith('header'):
                    break  # end of section
                line = line.strip()               
                key, value = line.split(' = ', 1)
                section[key] = ast.literal_eval(value)

            yield sectionname, section

Loop over the above function to receive (name, section_dict) tuples: 循环上面的函数来接收(name, section_dict)元组:

for name, section in load_sections(somefilename):
    print name, section

For your sample input data, that results in: 对于您的样本输入数据,结果如下:

>>> for name, section in load_sections('/tmp/example'):
...     print name, section
... 
one {'last_name': 'this is my last name', 'name': 'this is my name', 'addr_no': 35, 'addr': 'somewhere'}
two {'first_var': 0.001002, 'second_var': -2.002e-08}

Martijn Pieters is correct in his answer given your preformatted file, but if you can format the file in a different way in the first place, you will avoid a lot of potential bugs. Martijn Pieters在给出预格式化文件的答案中是正确的,但如果您可以首先以不同的方式格式化文件,则可以避免很多潜在的错误。 If I were you, I would look into getting the file formatted as JSON (or XML), because then you would be able to use python's json (or XML) libraries to do the work for you. 如果您是我,我将研究将文件格式化为JSON(或XML)格式,因为那样您就可以使用python的json(或XML)库为您完成工作。 http://docs.python.org/2/library/json.html . http://docs.python.org/2/library/json.html Unless you're working with really bad legacy code or a system that you don't have access to, you should be able to go into the code that spits out the file in the first place and make it give you a better file. 除非您使用的是非常糟糕的旧版代码或无法访问的系统,否则您应该能够首先进入将文件吐出的代码,并使其获得更好的文件。

def get_section(f):
    section=[]
    for line in f:
        section += [ line.strip("\n ") ]
        if section[-1] == 'header': break
    return section

sections = dict()
with open('input') as f:
    while True:
        section = get_section(f)
        if not section: break
        section_dict = dict()
        section_dict['sname'] = section[0].split()[1]
        for param in section[1:-2]:
            k,v = [ x.strip() for x in param.split('=')]
            section_dict[k] = v
        sections[section_dict['sname']] = section_dict

print sections['one']['name']

You can also access these sections as attributes: 您还可以将以下部分作为属性访问:

class Section:
    def __init__(self, d):
        self.__dict__ = d

one = Section(sections['one'])
print one.name

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM