简体   繁体   English

使用python管理日志文件

[英]Log file management with python

I got a file that has lots of different events from some service, I want to break those events in to different lines, and remove some "words & elements" Example of log file:我得到一个文件,其中包含来自某些服务的许多不同事件,我想将这些事件分成不同的行,并删除一些“单词和元素”日志文件示例:

"Event1":{"Time":"2022-12-16 16:04:16","Username":"IAmUser1@gmail.com","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"IAmUser2@gmail.com","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"IAmUser3@gmail.com","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},

As you see they all start with "EventX", At the end I want to see:如您所见,它们都以“EventX”开头,最后我想看到:

{"Time":"2022-12-16 16:04:16","Username":"IAmUser1@gmail.com","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"}
{"Time":"2022-12-16 16:03:59","Username":"IAmUser2@gmail.com","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"}
{"Time":"2022-12-16 15:54:56","Username":"IAmUser3@gmail.com","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},

As you see "EventX": and "," are removed and each event is now a new line at the file.如您所见,“EventX”:和“,”已被删除,每个事件现在都是文件中的一个新行。

Just a beginner here with Python and cannot figure this one out.只是这里的 Python 初学者,无法弄清楚这一点。

Thanks谢谢

tried combining re.search & re.findall without luck, Also tried to find a way to copy only things between {} and add those later and again no luck here.尝试将 re.search 和 re.findall 结合起来但没有成功,还尝试找到一种方法来仅复制 {} 之间的内容并稍后添加这些内容,但在这里再次失败。

This construct below works and makes a list of dictionaries from your data.下面的这个构造可以工作,并根据您的数据制作字典列表。 You could smash down some of this syntax with list or dictionary comprehensions, but it isn't needed.您可以使用列表或字典理解来分解其中的一些语法,但这不是必需的。

If you are having trouble with testing the regex expressions, this site is invaluable.如果您在测试regex表达式时遇到问题,这个站点非常有用。

Code代码

import regex as re

data = '''"Event1":{"Time":"2022-12-16 16:04:16","Username":"IAmUser1@gmail.com","IP_Address":"1.1.1.1","Action":"Action1","Data":"Datahere"},"Event2":{"Time":"2022-12-16 16:03:59","Username":"IAmUser2@gmail.com","IP_Address":"1.1.1.1","Action":"Action2","Data":"Datahere"},"Event3":{"Time":"2022-12-16 15:54:56","Username":"IAmUser3@gmail.com","IP_Address":"1.1.1.1","Action":"Action3","Data":"Datahere"},'''

splitter = r'"Event\d+":{(.*?)}'  # a search pattern to capture the stuff in braces

# tokenize the data source...
tokens = re.findall(splitter, data)

#print(tokens)


# now we can operate on the tokens and split them up into key-value pairs and put them into a list
result = []
for token in tokens:
    # make an empty dictionary to hold the row elements
    line_dict = {}
    # we can split the line (token) by comma to get the key-value pairs
    pairs = token.split(',')
    for pair in pairs:
        # another regex split needed here, because the timestamps have colons too
        splitter = r'"(.*)"\s*:\s*"(.*)"'    # capture two groups of things in quotes on opposite sides of colon
        parts = re.search(splitter, pair)
        key, value = parts.group(1), parts.group(2)
        line_dict[key] = value
    # add the dictionary of line elements to the result
    result.append(line_dict)

for d in result:
    print(d)

Output:输出:

{'Time': '2022-12-16 16:04:16', 'Username': 'IAmUser1@gmail.com', 'IP_Address': '1.1.1.1', 'Action': 'Action1', 'Data': 'Datahere'}
{'Time': '2022-12-16 16:03:59', 'Username': 'IAmUser2@gmail.com', 'IP_Address': '1.1.1.1', 'Action': 'Action2', 'Data': 'Datahere'}
{'Time': '2022-12-16 15:54:56', 'Username': 'IAmUser3@gmail.com', 'IP_Address': '1.1.1.1', 'Action': 'Action3', 'Data': 'Datahere'}

========= =========

Edit:编辑:

If you are having trouble getting the data out of the file, try something like this (and experiment...it isn't clear exactly how your file is formatted/linebreaks, etc.如果您在从文件中获取数据时遇到问题,请尝试这样的操作(并进行实验......目前尚不清楚您的文件是如何格式化/换行等的。

f_name = 'logfile.txt'

# use a context manager (look it up)
with open(f_name, 'r') as src:
    data = src.readlines()

# check it!
print(data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM