[英]How to unwrap wrapped lines in text file, reformat text file
我需要幫助找到一個Python解決方案來重新格式化包裝的行/重寫日志文件,所以沒有描述的換行符。 這將使我能夠繼續在不間斷的線路上找到。
* .log中的每個條目都帶有時間戳。 太長的行按預期包裝,但是:包裹的部分也帶有時間戳。 “>”(大於)是一條線已經包裹的唯一指示 - 發生在位置37上。>日志來自* nix機器。
我不知道怎么開始......
2011-223-18:31:11.737 VWR:tao abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5
2011-223-18:31:11.737 > -20.000000 10
###needs to be rewritten as:
2011-223-18:31:11.737 VWR:tao abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5 -20.000000 10
而另一個
2011-223-17:40:07.039 EVT:703 agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignmen
2011-223-17:40:07.039 >t check required.
###these lines deleted and consolodated as one:
2011-223-17:40:07.039 EVT:703 agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignment check required.
我不知道如何開始,除了......
for filename in validfilelist:
logfile = open(filename, 'r')
logfile_list = logfile.readlines()
logfile.close
for line in logfile_list:
for filename in validfilelist:
logfile = open(filename, 'r')
logfile_list = logfile.readlines()
logfile.close()
for line in logfile_list:
if(line[21:].strip()[0] == '>'):
#line_is_broken
else:
#line_is_not_broken
#!/usr/bin/python
import re
#2011-223-18:31:11.737 > -20.000000 10
ptn_wrp = re.compile(r"^\d+-\d+-\d+:\d+:\d+.\d+\s+>(.*)$")
validfilelist = ["log1.txt", "log2.txt"]
for filename in validfilelist:
logfile = open(filename, 'r')
logfile_new = open("%s.new" % filename, 'w')
for line in logfile:
line = line.rstrip('\n')
m = ptn_wrp.match(line)
if m:
logfile_new.write(m.group(1))
else:
logfile_new.write("\n")
logfile_new.write(line)
logfile_new.write("\n")
logfile.close()
logfile_new.close()
當行不是換行時寫新行。 唯一的副作用是開頭的空行。 不應該是日志分析的問題。 新文件是處理結果。
如果你將它包裝在filecontext中,這將成功:
f = [
"2011-223-18:31:11.737 VWR:tao abc exec /home/abcd/abcd9.94/bin/set_specb.tcl -s DL 2242.500000 5",
"2011-223-18:31:11.737 > -20.000000 10",
"2011-223-17:40:07.039 EVT:703 agc_drift_cal.tcl: out of tolerance drift of 5.3080163871 detected! Downlink Alignmen",
"2011-223-17:40:07.039 >t check required.",
]
import re
wrapped_line = "\d{4}-\d{3}-\d{2}:\d{2}:\d{2}\.\d{3} *>(.*$)"
result = [""]
for line in f:
thematch = re.match(wrapped_line,line)
if thematch:
result[-1] += thematch.group(1)
else:
result.append(line)
print result
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.