Python：使用不同的開始和結束標記逐行解析文件

Question

我有一個這樣的日志文件，帶有不同的開始和結束標記：

#Wiliam
#Arthur
#Jackie
high;
10 11 11;
#Jim
#Jill
#Catherine
#Abby
low;
girl;
10 11 11 11;
#Ablett
#Adelina
none;
5,8;

我需要逐行解析它以獲得如下結果：

[
  ['#Wiliam','#Arthur','#Jackie','high;','10 11 11;'],
  ['#Jim','#Jill','#Catherine','#Abby','low;','girl;','10 11 11 11;'],
  ['#Ablett','#Adelina','none;','5,8;']
]

有解決方案嗎？

Answer 1

可以理解，每個子列表都以#開頭，以;結尾; 。 這正是Pythonic生成器實現所使用的：

def read_lists():
    with open('data') as file:
        sublist = []
        previous_line = ''
        for line in file:
            line = line.strip()
            if line.startswith('#') and previous_line.endswith(';'):
                yield sublist
                sublist = []
            sublist.append(line)
            previous_line = line
        yield sublist

for sublist in read_lists():
    print(sublist)

['#Wiliam', '#Arthur', '#Jackie', 'high;', '10 11 11;']
['#Jim', '#Jill', '#Catherine', '#Abby', 'low;', 'girl;', '10 11 11 11;']
['#Ablett', '#Adelina', 'none;', '5,8;']

Answer 2

為了解析文件–您需要找到模式，這將使您成功進行數據收集。

從您的示例中，–我可以看到當您讀取帶有整數和分號的字符串時，您停止在子列表中追加項目。 我會嘗試這樣做：

import ast
result = []

with open(f,'rb') as fl:
    sublist = []
    for line in fl:            
        line = line.strip()
        sublist.append(line)
        if type(ast.literal_eval(line[0])) is int and line[-1] == ';':
            result.append(sublist)
            sublist = []

Answer 3

這是我的實現。 不能完全確定is_terminator（）邏輯應該是什么樣。

def is_terminator(tokens):
    """
    Return True if tokens is a terminator.

    """
    is_token_terminator = False    
    tokens = tokens.split()
    if len(tokens) > 0:
        token = tokens[-1]
        if token.endswith(";"):
            try:
                int(token[:-1])
            except ValueError:                
                pass # not an int.. and so not a terminator?
            else:
                 is_token_terminator = True
    return is_token_terminator


sublist = []
result = [sublist, ]
f = file("input.txt", "r")
for tokens in f.readlines():

    sublist.append(tokens)        

    if is_terminator(tokens):
        sublist = []
        result.append(sublist)

print result

Answer 4

這會將這些行追加到子列表中，直到到達之前的追加行以分號結尾但當前行沒有結束的點為止。 此時，它將創建一個新的子列表並繼續。

lst = [[]]

try:
    with open("log.log", "r") as f:
        i = 0 # Index of the sub-list

        for line in f:
            line = line.strip()

            if line[-1:] != ";" and lst[i] and lst[i][-1][-1:] == ";":
                i += 1 # Increment the sub-list index.
                lst.append([]) # Append a new empty sub-list.

            lst[i].append(line)
except FileNotFoundError:
    print("File does not exist.")

print(lst)

Python：使用不同的開始和結束標記逐行解析文件

問題描述

4 個解決方案

解決方案1
2 已采納 2017-09-30 04:09:14

解決方案2
0 2017-09-30 03:35:14

解決方案3
0 2017-09-30 04:01:48

解決方案4
0 2017-09-30 04:04:49

Python：使用不同的開始和結束標記逐行解析文件

問題描述

4 個解決方案

解決方案1 2 已采納 2017-09-30 04:09:14

解決方案2 0 2017-09-30 03:35:14

解決方案3 0 2017-09-30 04:01:48

解決方案4 0 2017-09-30 04:04:49

解決方案1
2 已采納 2017-09-30 04:09:14

解決方案2
0 2017-09-30 03:35:14

解決方案3
0 2017-09-30 04:01:48

解決方案4
0 2017-09-30 04:04:49