[英]Python: text log file processing and transposing rows to columns
I am new to python and stuck with a log file in text format, where it has following repetitive structure and I am required to extract the data from rows and change it into column depending upon the data. 我是python的新手,并且停留在文本格式的日志文件中,该文件具有以下重复结构,因此我需要从行中提取数据,然后根据数据将其更改为列。 eg
例如
First 50 line are trash like below(in first six lines): 前50行是以下垃圾内容(前6行):
-------------------------------------------------------------
Logging to file xyz.
Char
1,
3
r
=
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3
----------------------------------------------
a 1
b 2
c 3
----------------------------------------------
Pid 0
Name SAB=1, XYZ=3, P_NO=546467
----------------------------------------------
Test_data_1 00001
Test_data_2 FOXABC
Test_data_3 SHEEP123
Country US
----------------------------------------------
Pid 0
Name SAB=1
----------------------------------------------
Sno 893489423
------------Continues for another million lines. ------------继续进行另外一百万行。
Now the required output is like below: 现在所需的输出如下:
Required output format 所需的输出格式
PID, Name, a,b,c
0, "SAB=1, XYZ=3", 1,2,3
PID, Name , Test_data_1, Test_data_2, Test_data_3, Country
0, "SAB=1, XYZ=3, P_NO=546467", 00001, FOXABC, SHEEP123, US
Pid, Name, Sno
0, SAB=1, 893489423
I tried to write a code but failed to get the desired results: My attempt was as below: 我尝试编写代码,但未能获得预期的结果:我的尝试如下:
'''
fn=open(file_name,'r')
for i,line in enumerate(fn ):
if i >= 50 and "Name " in line: # for first 50 line deletion/or starting point
last_tag=line.split(",")[-1]
last_element=last_tag.split("=")[0]
print(last_element)
''' '''
Any help would be really appreciated. 任何帮助将非常感激。
The solution I came up with is a bit messy but it works, check it out below: 我想出的解决方案有点混乱,但它可以用,请在下面查看:
import sys
import re
import StringIO
ifile = open(sys.argv[1],'r') #Input log file as command-line argument
ofile = open(sys.argv[1][:-4]+"_formatted.csv",'w') #output formatted log txt
stringOut = ""
i = 0
flagReturn = True
j = 0
reVal = re.compile("Pid[\s]+(.*)\nName[\s]+(.*)\n[-]+\<br\>(.*)\<br\>") #Regex pattern for separating the Pid & Name from the variables
reVar = re.compile("(.*)[ ]+(.*)") #Regex pattern for getting vars and their values
reVarStr = re.compile(">>> [0-9]+.(.*)=(.*)") #Regex Pattern for Struct
reVarStrMatch = re.compile("Struct(.*)+has(.*)+members:") #Regex pattern for Struct check
for lines in ifile.readlines():
if(i>8): #Omitting the first 9 lines of Garbage values
if(lines.strip()=="----------------------------------------------"): #Checking for separation between PID & Name group and the Var group
j+=1 #variable keeping track of whether we are inside the vars section or not (between two rows of hyphens)
flagReturn = not flagReturn #To print the variables in single line to easily separate them with regex pattern reVal
if(not flagReturn):
stringTmp = lines.strip()+"<br>" #adding break to the end of each vars line in order for easier separation
else:
stringTmp = lines #if not vars then save each line as is
stringOut += stringTmp #concatenating each lines to form the searchable string
i+=1 #incrementing for omitting lines (useless after i=8)
if(j==2): #Once a complete set of PIDs, Names and Vars have been collected
j=0 #Reset j
matchObj = reVal.match(stringOut) #Match for PID, Name & Vars
line1 = "Pid,Name,"
line2 = matchObj.group(1).strip()+",\""+matchObj.group(2)+"\","
buf = StringIO.StringIO(matchObj.group(3).replace("<br>","\n"))
structFlag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"
ofile.writelines(line1[:-1]+"\n")
ofile.writelines(line2[:-1]+"\n")
ofile.writelines("\n")
stringOut = "" #Reseting the string
ofile.close()
ifile.close()
EDIT This is what I came up with to include the new pattern as well. 编辑这是我想出的也包括新模式的内容。
I suggest you do the following: 我建议您执行以下操作:
EDIT 2 编辑2
structFlag = False
RBYflag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
if(not (reVarStrMatch.match(line) is None)):
structFlag = True
elif(structFlag and (not (reVarStr.match(line) is None))):
matchObjVars = reVarStr.match(line)
if(matchObjVars.group(1).strip()=="RBY" and not RBYFlag):
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+"**"
RBYFlag = True
elif(matchObjVars.group(1).strip()=="RBY"):
line2 += matchObjVars.group(2).strip()+"**"
else:
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
else:
structFlag = False
if(RBYFlag):
line2 = line2[:-2]
RBYFlag = False
matchObjVars = reVar.match(line)
try:
line1 += matchObjVars.group(1).strip()+","
line2 += matchObjVars.group(2).strip()+","
except:
line1 += line.strip()+","
line2 += " ,"`
NOTE This loop has become very bloated and it is better to create a separate function to identify the type of data and return some value accordingly. 注意此循环变得非常膨胀,最好创建一个单独的函数来识别数据类型并相应地返回一些值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.