在特定行之間解析文本文件

Question

因此，如果我有一個看起來像這樣的文本文件，我想創建每個數據塊的列表。

[Blocktype A]
thing
thing
thing

[Blocktype A]
thing
thing
thing
thing
thing

[Blocktype A]
thing
thing

[Blocktype B]
thing
thing
thing

本質上，我希望我的代碼執行此操作。

如果該行=='[[Blocktype A]'，則將下一個X數行（可能有所不同）追加到“塊/節”列表中，直到到達換行符為止。 那時，將此“阻止”列表附加到總體列表，清空“阻止”列表，並對下一個Blocktype A節進行相同操作，直到到達新行，等等...我想對'[塊類型B]'。

最后，我試圖獲得一個以子列表為元素的列表。 換句話說，[Blocktype A]列表數據的列表，以及所有[Blocktype B]列表數據的列表

bigListA = [['Blocktype A'，'thing'，'thing'，'thing']，['Blocktype A'，'thing'，'thing'，'thing'，'thing'，'thing']等...]

bigListB = 與上面相同

我不確定如何在這樣的特定行之間進行解析。 有任何想法嗎？ 非常感謝！

編輯*這是我的代碼。 問題是，['B']節被添加到了原本不應該添加的列表中。 我感覺我的列表清空步驟已關閉。 我剛剛發現的另一個問題是，當我打印出返回列表中的元素時，每個元素都是相同的（僅文件中的第一個塊……只是重復了一次）

def getBlock(myFile):
"""
blah blah blah parses by stanza
"""
print myFile
with open(myFile, 'r') as inFile:
    print '~~~ newfile ~~~\n\n'
    extraData = list()
    blockList = list()
    for line in inFile:
        if line.strip() == '': # skips extraData, start of data blocks
            termBlock = list()
            for line in inFile:
                if line.strip() == '[A]' and len(termBlock) !=0: # A
                    blockList.append(termBlock) # appends termBlock to blockList
                    del termBlock[:] # ensures list is empty for new termBlock
                    termBlock.append(line.strip())
                elif line.strip() == '[B]' and len(termBlock) !=0: # B
                    del termBlock[:]
                    termBlock.append(line.strip())
                elif line.strip() == '': # skip line if it's blank
                    continue
                else: # add all block data
                    termBlock.append(line.strip())
        else:
            metaData.append(line) # adds metaData
    return blockList, metaData

Answer 1

我喜歡為此使用生成器函數：

import itertools
from pprint import pprint

def stanzas(f):
    stanza = []
    for line in f:
        line = line.strip()
        if line.startswith('['):
            if stanza:
                yield stanza
            stanza = []
        if line:
            stanza += [line]
    if stanza:
        yield stanza

with open('foo.ini') as input_file:
    all_data = stanzas(input_file)
    all_data = sorted(all_data, key = lambda x:x[0])
    all_data = itertools.groupby(all_data, key = lambda x:x[0])
    all_data = {k:list(v) for k,v in all_data}

# All of the data is in a dict in all_data. The dict keys are whatever
# stanza headers in the file there were.
# We can extract out the bits we want using []
bigListA = all_data['[Blocktype A]']
bigListB = all_data['[Blocktype B]']
pprint(bigListA)
pprint(bigListB)

Answer 2

像這樣：

bigLists = ([z.strip('[').strip(']') for z in y.split('\n') if z]
            for y in x.split('\n\n'))

bigListA = [x for x in bigLists if x[0] == 'Blocktype A']
bigListB = [x for x in bigLists if x[0] == 'Blocktype B']

Answer 3

輸出正是您所需要的

def bigList(list_name,start):
    quit_ask = ""
    list_name = []
    l = []
    check = True
    started = False
    with open("TEXT.txt") as text_file:
        for line in text_file:
            line = line.strip()
            if line.startswith(start) or started == True:
                while '' in l: l.remove('')
                if line.startswith(start):
                    quit_ask = line
                    if check != True:
                        list_name.append(l)
                    l = []
                    l.append(line)
                    started = True
                elif line.startswith('[') and line != quit_ask: break
                else: l.append(line); check = False
    list_name.append(l)
    return list_name

bigListA = []
bigListB = []
bigListA = bigList(bigListA,'[Blocktype A]')
bigListB = bigList(bigListB,'[Blocktype B]')

print bigListA
print bigListB

而且您也不必進口任何東西！

在特定行之間解析文本文件

問題描述

3 個解決方案

解決方案1
3 已采納 2015-10-19 19:27:58

解決方案2
1 2015-10-19 19:14:02

解決方案3
1 2015-10-19 19:37:56

在特定行之間解析文本文件

問題描述

3 個解決方案

解決方案1 3 已采納 2015-10-19 19:27:58

解決方案2 1 2015-10-19 19:14:02

解決方案3 1 2015-10-19 19:37:56

解決方案1
3 已采納 2015-10-19 19:27:58

解決方案2
1 2015-10-19 19:14:02

解決方案3
1 2015-10-19 19:37:56