简体   繁体   English

在特定行之间解析文本文件

[英]Parse text file between specific lines

So if I have a text file that looks like this, I want to create lists of each block of data. 因此,如果我有一个看起来像这样的文本文件,我想创建每个数据块的列表。

[Blocktype A]
thing
thing
thing

[Blocktype A]
thing
thing
thing
thing
thing

[Blocktype A]
thing
thing

[Blocktype B]
thing
thing
thing

Essentially I want my code to do this.... 本质上,我希望我的代码执行此操作。

If the line == '[Blocktype A]', append the next X number (can vary) of lines to a 'block/stanza' list until the newline is reached. 如果该行=='[[Blocktype A]',则将下一个X数行(可能有所不同)追加到“块/节”列表中,直到到达换行符为止。 At that point, append this 'block' list to an overall list, empty the 'block' list, and do the same for the next Blocktype A stanza until new line is reached etc...I want to do the same for '[Blocktype B]'. 那时,将此“阻止”列表附加到总体列表,清空“阻止”列表,并对下一个Blocktype A节进行相同操作,直到到达新行,等等...我想对'[块类型B]'。

In the end, I'm trying to get a list that has sub-lists as elements. 最后,我试图获得一个以子列表为元素的列表。 In other words, a list of [Blocktype A] list data, and a list of all [Blocktype B] list data 换句话说,[Blocktype A]列表数据的列表,以及所有[Blocktype B]列表数据的列表

bigListA = [ ['Blocktype A', 'thing', 'thing', 'thing'], ['Blocktype A', 'thing', 'thing', 'thing', 'thing', 'thing'], etc...] bigListA = [['Blocktype A','thing','thing','thing'],['Blocktype A','thing','thing','thing','thing','thing']等...]

bigListB = same as above bigListB = 与上面相同

I am unsure how to parse between specific lines like this. 我不确定如何在这样的特定行之间进行解析。 Any ideas? 有任何想法吗? Thanks so much! 非常感谢!

edit* here is my code. 编辑*这是我的代码。 the issue with this is, the ['B'] stanzas are getting added to lists they aren't supposed to. 问题是,['B']节被添加到了原本不应该添加的列表中。 I feel like my list emptying steps are off. 我感觉我的列表清空步骤已关闭。 Another issue I just caught is that when I print out the elements of the returned list, every element is the same (only the first block in the file...it just gets repeated) 我刚刚发现的另一个问题是,当我打印出返回列表中的元素时,每个元素都是相同的(仅文件中的第一个块……只是重复了一次)

def getBlock(myFile):
"""
blah blah blah parses by stanza
"""
print myFile
with open(myFile, 'r') as inFile:
    print '~~~ newfile ~~~\n\n'
    extraData = list()
    blockList = list()
    for line in inFile:
        if line.strip() == '': # skips extraData, start of data blocks
            termBlock = list()
            for line in inFile:
                if line.strip() == '[A]' and len(termBlock) !=0: # A
                    blockList.append(termBlock) # appends termBlock to blockList
                    del termBlock[:] # ensures list is empty for new termBlock
                    termBlock.append(line.strip())
                elif line.strip() == '[B]' and len(termBlock) !=0: # B
                    del termBlock[:]
                    termBlock.append(line.strip())
                elif line.strip() == '': # skip line if it's blank
                    continue
                else: # add all block data
                    termBlock.append(line.strip())
        else:
            metaData.append(line) # adds metaData
    return blockList, metaData

I like to use generator functions for this: 我喜欢为此使用生成器函数:

import itertools
from pprint import pprint

def stanzas(f):
    stanza = []
    for line in f:
        line = line.strip()
        if line.startswith('['):
            if stanza:
                yield stanza
            stanza = []
        if line:
            stanza += [line]
    if stanza:
        yield stanza

with open('foo.ini') as input_file:
    all_data = stanzas(input_file)
    all_data = sorted(all_data, key = lambda x:x[0])
    all_data = itertools.groupby(all_data, key = lambda x:x[0])
    all_data = {k:list(v) for k,v in all_data}

# All of the data is in a dict in all_data. The dict keys are whatever
# stanza headers in the file there were.
# We can extract out the bits we want using []
bigListA = all_data['[Blocktype A]']
bigListB = all_data['[Blocktype B]']
pprint(bigListA)
pprint(bigListB)

Like this: 像这样:

bigLists = ([z.strip('[').strip(']') for z in y.split('\n') if z]
            for y in x.split('\n\n'))

bigListA = [x for x in bigLists if x[0] == 'Blocktype A']
bigListB = [x for x in bigLists if x[0] == 'Blocktype B']

The output is exactly what you need 输出正是您所需要的

def bigList(list_name,start):
    quit_ask = ""
    list_name = []
    l = []
    check = True
    started = False
    with open("TEXT.txt") as text_file:
        for line in text_file:
            line = line.strip()
            if line.startswith(start) or started == True:
                while '' in l: l.remove('')
                if line.startswith(start):
                    quit_ask = line
                    if check != True:
                        list_name.append(l)
                    l = []
                    l.append(line)
                    started = True
                elif line.startswith('[') and line != quit_ask: break
                else: l.append(line); check = False
    list_name.append(l)
    return list_name

bigListA = []
bigListB = []
bigListA = bigList(bigListA,'[Blocktype A]')
bigListB = bigList(bigListB,'[Blocktype B]')

print bigListA
print bigListB

And you aren't forced to import anything! 而且您也不必进口任何东西!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM