Python：文本日志文件处理并将行转换为列

Question

我是python的新手，并且停留在文本格式的日志文件中，该文件具有以下重复结构，因此我需要从行中提取数据，然后根据数据将其更改为列。 例如

前50行是以下垃圾内容（前6行）：

    ------------------------------------------------------------- 
Logging to file  xyz.
Char 
1,
 3 
r
 =

 ---------------------------------------------- 
Pid                             0 
Name                                   SAB=1, XYZ=3
---------------------------------------------- 
a              1 
b              2 
c              3
---------------------------------------------- 
Pid                             0 
Name                                   SAB=1, XYZ=3, P_NO=546467
---------------------------------------------- 
Test_data_1              00001 
Test_data_2              FOXABC 
Test_data_3         SHEEP123
Country             US
---------------------------------------------- 
Pid                             0 
Name                                   SAB=1
---------------------------------------------- 
Sno                  893489423

日志文件格式

------------继续进行另外一百万行。

现在所需的输出如下：

所需的输出格式

PID, Name,       a,b,c
0, "SAB=1, XYZ=3", 1,2,3

PID, Name         , Test_data_1, Test_data_2, Test_data_3, Country
0, "SAB=1, XYZ=3, P_NO=546467", 00001, FOXABC, SHEEP123, US

Pid, Name, Sno
0, SAB=1, 893489423

我尝试编写代码，但未能获得预期的结果：我的尝试如下：

'''
fn=open(file_name,'r')
for i,line in enumerate(fn ):
   if i >= 50 and "Name " in line:   # for first 50 line deletion/or starting point
         last_tag=line.split(",")[-1]
         last_element=last_tag.split("=")[0]
         print(last_element)

'''

任何帮助将非常感激。

新发现的结构

RBY结构

Answer 1

我想出的解决方案有点混乱，但它可以用，请在下面查看：

import sys
import re
import StringIO


ifile = open(sys.argv[1],'r')   #Input log file as command-line argument
ofile = open(sys.argv[1][:-4]+"_formatted.csv",'w') #output formatted log txt

stringOut = ""

i = 0
flagReturn = True
j = 0

reVal = re.compile("Pid[\s]+(.*)\nName[\s]+(.*)\n[-]+\<br\>(.*)\<br\>") #Regex pattern for separating the Pid & Name from the variables
reVar = re.compile("(.*)[ ]+(.*)") #Regex pattern for getting vars and their values
reVarStr = re.compile(">>> [0-9]+.(.*)=(.*)") #Regex Pattern for Struct
reVarStrMatch = re.compile("Struct(.*)+has(.*)+members:") #Regex pattern for Struct check


for lines in ifile.readlines():
    if(i>8): #Omitting the first 9 lines of Garbage values
        if(lines.strip()=="----------------------------------------------"): #Checking for separation between PID & Name group and the Var group
            j+=1 #variable keeping track of whether we are inside the vars section or not (between two rows of hyphens)
            flagReturn = not flagReturn #To print the variables in single line to easily separate them with regex pattern reVal

        if(not flagReturn):
            stringTmp = lines.strip()+"<br>" #adding break to the end of each vars line in order for easier separation
        else:
            stringTmp = lines #if not vars then save each line as is

        stringOut += stringTmp #concatenating each lines to form the searchable string

    i+=1 #incrementing for omitting lines (useless after i=8)

    if(j==2):   #Once a complete set of PIDs, Names and Vars have been collected
        j=0     #Reset j
        matchObj = reVal.match(stringOut) #Match for PID, Name & Vars
        line1 = "Pid,Name,"
        line2 = matchObj.group(1).strip()+",\""+matchObj.group(2)+"\","
        buf = StringIO.StringIO(matchObj.group(3).replace("<br>","\n"))
        structFlag = False
        for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
            if(not (reVarStrMatch.match(line) is None)):
                structFlag = True
            elif(structFlag and (not (reVarStr.match(line) is None))):
                matchObjVars = reVarStr.match(line)
                line1 += matchObjVars.group(1).strip()+","
                line2 += matchObjVars.group(2).strip()+","

            else:
                structFlag = False
                matchObjVars = reVar.match(line)
                try:
                    line1 += matchObjVars.group(1).strip()+","
                    line2 += matchObjVars.group(2).strip()+","
                except:
                    line1 += line.strip()+","
                    line2 += " ,"

        ofile.writelines(line1[:-1]+"\n")
        ofile.writelines(line2[:-1]+"\n")
        ofile.writelines("\n")
        stringOut = "" #Reseting the string 

ofile.close()
ifile.close()

编辑这是我想出的也包括新模式的内容。

我建议您执行以下操作：

在日志文件的副本上运行解析器脚本，然后查看下一步失败的地方。
识别并写下破坏解析器的新模式。
删除新标识的模式中的所有数据。
从第1步开始重复，直到所有模式都被识别。
为每种类型的模式创建单独的正则表达式模式，然后在单独的函数中调用它们以写入字符串。

编辑2

structFlag = False
RBYflag = False
for line in buf.readlines(): #Separate each vars and add to the respective strings for writing to file
            if(not (reVarStrMatch.match(line) is None)):
                structFlag = True
            elif(structFlag and (not (reVarStr.match(line) is None))):
                matchObjVars = reVarStr.match(line)
                if(matchObjVars.group(1).strip()=="RBY" and not RBYFlag):
                    line1 += matchObjVars.group(1).strip()+","
                    line2 += matchObjVars.group(2).strip()+"**"
                    RBYFlag = True
                elif(matchObjVars.group(1).strip()=="RBY"):
                    line2 += matchObjVars.group(2).strip()+"**"
                else:
                    if(RBYFlag):
                        line2 = line2[:-2]
                        RBYFlag = False
                    line1 += matchObjVars.group(1).strip()+","
                    line2 += matchObjVars.group(2).strip()+","

        else:
            structFlag = False
            if(RBYFlag):
                line2 = line2[:-2]
                RBYFlag = False
            matchObjVars = reVar.match(line)
            try:
                line1 += matchObjVars.group(1).strip()+","
                line2 += matchObjVars.group(2).strip()+","
            except:
                line1 += line.strip()+","
                line2 += " ,"`

注意此循环变得非常膨胀，最好创建一个单独的函数来识别数据类型并相应地返回一些值。

Python：文本日志文件处理并将行转换为列

问题描述

1 个解决方案

解决方案1
0 2019-09-15 23:55:57

Python：文本日志文件处理并将行转换为列

问题描述

1 个解决方案

解决方案1 0 2019-09-15 23:55:57

解决方案1
0 2019-09-15 23:55:57