![](/img/trans.png)
[英]How to do multi-line string search in a file and get start line, end line info in python?
[英]Python: parse file line by line with different start and end marks
我有一个这样的日志文件,带有不同的开始和结束标记:
#Wiliam
#Arthur
#Jackie
high;
10 11 11;
#Jim
#Jill
#Catherine
#Abby
low;
girl;
10 11 11 11;
#Ablett
#Adelina
none;
5,8;
我需要逐行解析它以获得如下结果:
[
['#Wiliam','#Arthur','#Jackie','high;','10 11 11;'],
['#Jim','#Jill','#Catherine','#Abby','low;','girl;','10 11 11 11;'],
['#Ablett','#Adelina','none;','5,8;']
]
有解决方案吗?
可以理解,每个子列表都以#
开头,以;
结尾;
。 这正是Pythonic生成器实现所使用的:
def read_lists():
with open('data') as file:
sublist = []
previous_line = ''
for line in file:
line = line.strip()
if line.startswith('#') and previous_line.endswith(';'):
yield sublist
sublist = []
sublist.append(line)
previous_line = line
yield sublist
for sublist in read_lists():
print(sublist)
['#Wiliam', '#Arthur', '#Jackie', 'high;', '10 11 11;']
['#Jim', '#Jill', '#Catherine', '#Abby', 'low;', 'girl;', '10 11 11 11;']
['#Ablett', '#Adelina', 'none;', '5,8;']
为了解析文件–您需要找到模式 ,这将使您成功进行数据收集。
从您的示例中,–我可以看到当您读取带有整数和分号的字符串时,您停止在子列表中追加项目。 我会尝试这样做:
import ast
result = []
with open(f,'rb') as fl:
sublist = []
for line in fl:
line = line.strip()
sublist.append(line)
if type(ast.literal_eval(line[0])) is int and line[-1] == ';':
result.append(sublist)
sublist = []
这是我的实现。 不能完全确定is_terminator()逻辑应该是什么样。
def is_terminator(tokens):
"""
Return True if tokens is a terminator.
"""
is_token_terminator = False
tokens = tokens.split()
if len(tokens) > 0:
token = tokens[-1]
if token.endswith(";"):
try:
int(token[:-1])
except ValueError:
pass # not an int.. and so not a terminator?
else:
is_token_terminator = True
return is_token_terminator
sublist = []
result = [sublist, ]
f = file("input.txt", "r")
for tokens in f.readlines():
sublist.append(tokens)
if is_terminator(tokens):
sublist = []
result.append(sublist)
print result
这会将这些行追加到子列表中,直到到达之前的追加行以分号结尾但当前行没有结束的点为止。 此时,它将创建一个新的子列表并继续。
lst = [[]]
try:
with open("log.log", "r") as f:
i = 0 # Index of the sub-list
for line in f:
line = line.strip()
if line[-1:] != ";" and lst[i] and lst[i][-1][-1:] == ";":
i += 1 # Increment the sub-list index.
lst.append([]) # Append a new empty sub-list.
lst[i].append(line)
except FileNotFoundError:
print("File does not exist.")
print(lst)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.