Python：解析成对的日志文件

Question

I have a log file that I need to parse for specific events.我有一个日志文件，需要针对特定事件进行解析。 The problem is the data I need comes from pairs of event entries that each hold pieces of the data needed.问题是我需要的数据来自成对的事件条目，每个条目都包含所需的数据片段。

For instance there will be a line with an event type = test with some data and then shortly after there is another line with an event type = test2 with some more data.例如，将有一行带有事件类型 = test 和一些数据，然后不久之后有另一行带有更多数据的事件类型 = test2。

There may be many instances of these pairs of data in the file or none.文件中可能有许多这些数据对的实例，也可能没有。

What I need to do is tell the code that when it finds a line with event=test then also look for the next instance of event=test2 which is usually a couple of lines later in the log.我需要做的是告诉代码，当它找到带有 event=test 的行时，还要查找 event=test2 的下一个实例，它通常在日志中的后面几行。

This is a sample of the data file:这是数据文件的示例：

2020-08-25 03:36:56.006 INFO    Panda HOOK: {"event":"keepalive","time":1600.0064477}
2020-08-25 03:37:01.006 INFO    Panda HOOK: {"event":"keepalive","time":1605.0066958}
2020-08-25 03:37:06.004 INFO    Panda HOOK: {"event":"keepalive","time":1610.004206}
2020-08-25 03:37:11.003 INFO    Panda HOOK: {"event":"keepalive","time":1615.0032498}
2020-08-25 03:37:16.005 INFO    Panda HOOK: {"event":"keepalive","time":1620.0056292}
2020-08-25 03:37:21.001 INFO    Panda HOOK: {"event":"keepalive","time":1625.0011002}
2020-08-25 03:37:26.007 INFO    Panda HOOK: {"event":"keepalive","time":1630.0073155}
2020-08-25 03:37:31.008 INFO    Panda HOOK: {"event":"keepalive","time":1635.0086481}
2020-08-25 03:37:32.687 INFO    Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,
2020-08-25 03:37:21.001 INFO    Panda HOOK: {"event":"keepalive","time":1625.0011002}
2020-08-25 03:37:26.007 INFO    Panda HOOK: {"event":"keepalive","time":1630.0073155}
2020-08-25 03:37:31.008 INFO    Panda HOOK: {"event":"keepalive","time":1635.0086481}
2020-08-25 03:37:34.414 INFO    Scripting: event:type=test2,t=25277.04,type=comment,

And here is some code that I have to get the first line 2020-08-25 03:37:32.687 INFO Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,这是我必须获得第一行的一些代码2020-08-25 03:37:32.687 INFO Scripting: event:type=test,initiator=Abe Lincoln,place=Washinton,

f = open('data.log', 'r')
lines = f.readlines()
test2Event = 'event:type=test2'
testEvent = 'event:type=test'
for string in lines:
    if testEvent in string:
        initPerson = string.split('initiator=')[1]
f = open('data.log', 'r')
lines = f.readlines()
test2Event = 'event:type=test2'
testEvent = 'event:type=test'
for string in lines:
    if testEvent in string:
        initPerson = string.split('initiator=')[1]
        person = initPerson.split(',')[0]
        print(person)

I am getting an error with this code as well as my desired result to this point.我收到此代码的错误以及到目前为止我想要的结果。 I don't understand why, as I have used this exact script with a differnt string to split with no problems.我不明白为什么，因为我使用了这个带有不同字符串的确切脚本来拆分没有问题。

RESULT结果

Abe Lincoln
Traceback (most recent call last):
  File "main.py", line 15, in <module>
    initPerson = string.split('initiator=')[1]
IndexError: list index out of range

Any suggestions on how to get the next line of data so that I can combine the data for insertion into a db or similar would be appreciated...as well as any help with why the error message is happening because I do not see what the issue is.关于如何获取下一行数据的任何建议，以便我可以将数据合并到一个数据库或类似的数据中，我们将不胜感激......以及任何有关为什么会发生错误消息的帮助，因为我看不到问题是。

The code and data is avaiable for testing at https://onlinegdb.com/Hyuuj7Mmv代码和数据可在https://onlinegdb.com/Hyuuj7Mmv进行测试

Answer 1

Reading the entire file twice is absolutely excessive.两次读取整个文件绝对是多余的。 Instead, keep track of what you have done previously while traversing the file.相反，在遍历文件时跟踪您之前所做的事情。

seen_test = False   # state variable for keeping track of what you have done
init_person = None  # note snake_case variable convention pro headlessCamelCase

with open('data.log', 'r') as f:
    for lineno, line in enumerate(f, start=1):
        if 'event:type=test,' in line:
            if seen_test:
                raise ValueError(
                    'line %i: type=test without test2: %s' % (
                        lineno, line))
            init_person = line.split('initiator=')[1].split(',')[0]
            seen_test = True
        elif 'event:type=test2' in line:
            if seen_test:
                # ... do whatever you want with init_person
                # maybe something like
                result = line.rstrip('\n').split(',')
                print('Test by %s got results %s' % (init_person, result[1:]))
            else:
                raise ValueError(
                    'line %i: type=test2 without test: %s' % (
                        lineno, line))
            seen_test = False

The enumerate is just to get a useful line number into the error message when there is a failure; enumerate只是为了在出现故障时在错误信息中获取一个有用的行号； if you are sure that the file is always well-formatted, maybe take that out.如果您确定该文件的格式始终良好，则可以将其删除。

This will still fail if the type=test line doesn't contain initiator= but we have no idea what would be useful to do in that scenario so I'm not trying to tackle that.如果type=test行不包含initiator=这仍然会失败，但我们不知道在这种情况下做什么会有用，所以我不想解决这个问题。

Demo: https://repl.it/repls/OverdueFruitfulComputergames#main.py演示： https : //repl.it/repls/OverdueFruitfulComputergames#main.py

Answer 2

This should do what you want :这应该做你想做的：

import re

f = open('data.log', 'r')
lines = f.readlines()
results = {}
for line in lines:
    if "Scripting:" in line.strip():
        res = dict(re.findall(r"([^= ]+)=(.+?),", line.strip(), re.DOTALL))
        # if the eventtype ends with `2` and match an existing key in results, update data
        if res['event:type'][-1] == '2' and res['event:type'][:-1] in results:
            results[res['event:type'][:-1]].update(res)
        else:
            results[res['event:type']] = res

print(results)

# {'test': {'event:type': 'test2', 'initiator': 'Abe Lincoln', 'place': 'Washinton', 't': '25277.04', 'type': 'comment'}}

Python：解析成对的日志文件

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-08-25 06:26:42

解决方案2
1 2020-08-25 06:27:42

Python：解析成对的日志文件

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-08-25 06:26:42

解决方案2 1 2020-08-25 06:27:42

解决方案1
2 已采纳 2020-08-25 06:26:42

解决方案2
1 2020-08-25 06:27:42