简体   繁体   English

如何在文本文件中展开包装的行,重新格式化文本文件

[英]How to unwrap wrapped lines in text file, reformat text file

I need help finding a Python solution to reformat the wrapped lines / rewrite the log file so there are no line breaks as described. 我需要帮助找到一个Python解决方案来重新格式化包装的行/重写日志文件,所以没有描述的换行符。 That will allow me to continue to find on unbroken lines. 这将使我能够继续在不间断的线路上找到。

Every entry in the *.log is time stamped. * .log中的每个条目都带有时间戳。 Lines that are too long are wrapped as expected, however: The wrapped part is also time stamped. 太长的行按预期包装,但是:包裹的部分也带有时间戳。 ">" (Greater than) is the only indication that a line has wrapped - happens on position 37. > The log is from a *nix machine. “>”(大于)是一条线已经包裹的唯一指示 - 发生在位置37上。>日志来自* nix机器。

I don't know how to begin... 我不知道怎么开始......

2011-223-18:31:11.737  VWR:tao       abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5
2011-223-18:31:11.737                > -20.000000 10
###needs to be rewritten as:
2011-223-18:31:11.737  VWR:tao       abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5 -20.000000 10

And another 而另一个

2011-223-17:40:07.039  EVT:703       agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignmen
2011-223-17:40:07.039                >t check required.
###these lines deleted and consolodated as one:
2011-223-17:40:07.039  EVT:703       agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignment check required.

I don't know how to begin, other than... 我不知道如何开始,除了......

for filename in validfilelist:
    logfile = open(filename, 'r')
    logfile_list = logfile.readlines()
    logfile.close
    for line in logfile_list:
for filename in validfilelist:
    logfile = open(filename, 'r')
    logfile_list = logfile.readlines()
    logfile.close()
    for line in logfile_list:
        if(line[21:].strip()[0] == '>'):
           #line_is_broken
        else:
           #line_is_not_broken
#!/usr/bin/python

import re

#2011-223-18:31:11.737                > -20.000000 10
ptn_wrp = re.compile(r"^\d+-\d+-\d+:\d+:\d+.\d+\s+>(.*)$")

validfilelist = ["log1.txt", "log2.txt"]

for filename in validfilelist:
    logfile = open(filename, 'r')
    logfile_new = open("%s.new" % filename, 'w')
    for line in logfile:
        line = line.rstrip('\n')
        m = ptn_wrp.match(line)
        if m:
            logfile_new.write(m.group(1))
        else:
            logfile_new.write("\n")
            logfile_new.write(line)
    logfile_new.write("\n")
    logfile.close()
    logfile_new.close()

write new line when the line is not a wrap line. 当行不是换行时写新行。 the only side effect is an empty line in the beginning. 唯一的副作用是开头的空行。 should not be a problem for log analysis. 不应该是日志分析的问题。 new file is the processed result. 新文件是处理结果。

This would do the trick if you wrap it in a filecontext: 如果你将它包装在filecontext中,这将成功:

f = [
    "2011-223-18:31:11.737  VWR:tao       abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5",
    "2011-223-18:31:11.737                > -20.000000 10",
    "2011-223-17:40:07.039  EVT:703       agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignmen",
    "2011-223-17:40:07.039                >t check required.",
    ]

import re

wrapped_line = "\d{4}-\d{3}-\d{2}:\d{2}:\d{2}\.\d{3} *>(.*$)"

result = [""]
for line in f:
    thematch = re.match(wrapped_line,line)
    if thematch:
        result[-1] += thematch.group(1)
    else:
        result.append(line)

print result

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM