[英]Python read multiline into a single line by reading a file line by line
I want to get the following files from我想从中获取以下文件
mwe.log日志文件
07:23:07.754 A
07:23:07.759 B
C
D
E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III
I would expect我希望
07:23:07.754 A
07:23:07.759 B C D E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III
by executing this code通过执行这段代码
import re
input_file = "mwe.log"
def read_logfile(full_file, start):
result_intermediate_line = ''
with open(input_file, 'r') as fin:
for _raw_line in fin:
log_line = _raw_line.rstrip()
#result = ''
if start.match(log_line):
if len(result_intermediate_line) > 0:
result = result_intermediate_line
else:
result = log_line
else:
result = result_intermediate_line + log_line
yield result
if __name__ == "__main__":
number_line = re.compile(r'^\d+\:\d+\:\d+\.\d+\s+')
for line in read_logfile(input_file, number_line):
print(line)
Should be used by python 3.7 and above.应由 python 3.7 及更高版本使用。 So my issue is that I would like to have each line with a timestamp like shown above so that I can postprocessing a single line.
所以我的问题是,我希望每一行都带有如上所示的时间戳,以便我可以对单行进行后处理。 So it could be seen as an converter from a format 1 to a format 2.
所以它可以看作是格式 1 到格式 2 的转换器。
Do you have any idea where I got the bug in?你知道我的错误在哪里吗?
This should work:这应该有效:
import re
input_file = "mwe.log"
def read_logfile(input_file, start):
with open(input_file, "r") as fin:
result_intermediate_line = next(fin).rstrip()
for _raw_line in fin:
log_line = _raw_line.rstrip()
if start.match(log_line):
previous_line = result_intermediate_line
result_intermediate_line = log_line
yield previous_line
else:
result_intermediate_line += " " + log_line
yield result_intermediate_line
if __name__ == "__main__":
number_line = re.compile(r"^\d+\:\d+\:\d+\.\d+\s+")
for line in read_logfile(input_file, number_line):
print(line)
The problem is that you were always yielding the line, instead I only yield if the new line has a timestamp at the start, otherwise i append the line to the previous one.问题是你总是让出该行,而我只在新行的开头有时间戳时才会让出,否则我 append 到上一行的行。
You could also parse the file entirely and return
just once, like so:您也可以完全解析文件并只
return
一次,如下所示:
def read_logfile(file, pattern):
result = list()
with open(file) as fin:
for line in fin:
if pattern.match(line.strip()):
result.append(line.strip())
else:
result[-1]+=f" {line.strip()}"
return "\n".join(result)
>>> print(read_logfile("mwe.log", re.compile(r"^\d+\:\d+\:\d+\.\d+\s+")))
07:23:07.754 A
07:23:07.759 B C D E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III
Another approach, leveraging the power of re.sub
:另一种方法,利用
re.sub
的力量:
import re
input_file = "mwe.log"
time_pattern = r'\d+\:\d+\:\d+\.\d+\s+'
new_line_pattern = re.compile(rf'{time_pattern}.*?(?=\n{time_pattern})', re.DOTALL)
with open(input_file, 'r') as fin:
log = fin.read()
new_log = re.sub(new_line_pattern, lambda x: x.group(0).replace("\n", " "), log)
print(new_log)
Output: Output:
07:23:07.754 A
07:23:07.759 B C D E
07:23:07.770 I
07:23:07.770 II
07:23:07.770 III
you initialize the variable "result_intermediate_line" with value ''... ...and never change this.你用值''初始化变量“result_intermediate_line”... ...并且永远不要改变它。
so the concatenation所以串联
result = result_intermediate_line + log_line
doesn't have any effect.没有任何效果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.