简体   繁体   English

Python在两个字符串之间的文本文件中读取一系列信息

[英]Python read series of information in text file between two strings

I have a text file with the following type of format: 我有一个具有以下格式类型的文本文件:

BEGIN *A information here* END
BEGIN *B information here* END
BEGIN *C information here*
    *C additional information here*
    *C additional information here*
    BEGIN *C secondary information here*
          *C additional secondary information*
          BEGIN *C tertiary information* END
    END
    BEGIN *C secondary information*
    END
END
BEGIN *D information here* END

I want to read the information between BEGIN and END and keep the information in the same sort of format, as a list of lists. 我想读取BEGIN和END之间的信息,并以相同的格式将信息保留为列表列表。 I have tried replacing 'BEGIN' and 'END' with '[' and ']' respectively, and then tried to evaluate the resulting string, but it throws a syntax error when it hits a number in the information. 我尝试用“ [”和“]”分别替换“ BEGIN”和“ END”,然后尝试评估结果字符串,但是当它在信息中击中一个数字时,它将引发语法错误。 This is the code I tried: 这是我尝试的代码:

with open(filepath) as infile:
mylist = []
for line in infile:
    line = line.strip()
    line = line.replace('BEGIN', '[')
    line = line.replace('END', ']')
    mylist.append(line)

for n in mylist:
    print n

which produces: 产生:

[ *A information here* ]
[ *B information here* ]
[ *C information here*
*C additional information here*
*C additional information here*
[ *C secondary information here*
*C additional secondary information*
[ *C tertiary information* ]
]
[ *C secondary information*
]
]
[ *D information here* ]

Is there any way to get the data out as a list of lists like so: 有没有办法像这样将数据作为列表列表取出:

>>>for n in mylist:
>>>   print n
[*A information here*]
[*B information here*]
[*C information here* *C additional information here* [*C secondary information here* *C additional secondary information* [*C tertiary information*]] [*C secondary information*]]
[*D information here*]

Assuming the file doesn't contain any brackets, you could replace "BEGIN" and "END" with brackets like you did, then write a recursive function to parse it: 假设文件不包含任何括号,则可以像以前一样用括号替换“ BEGIN”和“ END”,然后编写一个递归函数来解析它:

def parse(text):
    j=0
    result = [""]  # initialize a list to store the result
    for i in range(len(text)):  # iterate over indices of characters
        if text[i] == "[":
            s = ""  # initialize a string to store the text
            nestlevel = 1  # initialize a variable to store number of nested blocks
            j = i
            while nestlevel != 0:  # loop until outside all nested blocks
                j+=1
                # increment or decrement nest level on encountering brackets
                if text[j]=="[":
                    nestlevel+=1
                if text[j]=="]":
                    nestlevel-=1
            # data block goes from index i+1 to index j-1
            result.append(parse(text[i+1:j]))  # slicing doesn't include end bound element
            result.append("")
        elif i>j:
            result[-1]=result[-1]+text[i]
    return result
with open(filepath) as f:
    data=parse(f.read().replace("BEGIN","[").replace("END","]"))

This is just a rough idea, and I'm sure it could be optimized and improved in other ways. 这只是一个粗略的想法,我相信可以通过其他方式对其进行优化和改进。 Also, it might return empty strings where there was no text between sub-lists. 另外,它可能会返回空字符串,其中子列表之间没有文本。

I have managed to get it working with the following code: 我设法使它与以下代码一起使用:

def getObjectData(filepath):
    with open(filepath) as infile:
        mylist = []
        linenum = 0
        varcount = 0
        varlinedic = {}
        for line in infile:
            line = line.replace('BEGIN', '[').replace('END', ']')
            linenum += 1
            if line.startswith('['):
                varcount += 1

            varlinedic[varcount] = linenum
            mylist.append(line.strip())

    for key in varlinedic:
        if key == varlinedic[key]:
            print mylist[varlinedic[key]-1:varlinedic[key]]
        else:
            print mylist[varlinedic[key-1]:varlinedic[key]]

print getObjectData(filepath)

It returns: 它返回:

['[ *A information here* ]']
['[ *B information here* ]']
['[ *C information here*', '*C additional information here*', '*C additional information here*', '[ *C secondary information here*', '*C additional secondary information*', '[ *C tertiary information* ]', ']', '[ *C secondary information*', ']', ']']
['[ *D information here* ]']
None

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python读取两个字符串之间的特定文本行 - Python read specific lines of text between two strings 使用 Python 提取文本文件中两个字符串之间的文本 - Extract text present in between two strings in a text file using Python Python,用于从两个指定字符串之间的文件中读取信息(当这些字符串可以出现在其他位置时) - Python for reading information from a file between two specified strings when these strings can be present elsewhere 使用python读取文本文件中两个字符串之间的行 - Reading lines between two strings in text file using python 使用python在文本文件中的两个字符串之间提取值 - Extract Values between two strings in a text file using python Python:提取文本文件中两个字符串之间的值 - Python: extract values between two strings in text file 使用 Python 提取文本文件中两个字符串之间的文本数据 - Extract textual data in between two strings in a text file using Python Python代码在文本文件中查找两个字符串之间的长度的问题 - Problem with Python Code for finding length between two strings in a text file 在字节文本文件中提取两个字符之间的字符串,Python - extracting strings between two characters in bytes text file ,Python 尝试将文件中的信息读入Python中的两个列表 - Trying to read information in a file into two lists in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM