简体   繁体   English

python:使用正则表达式从日志文件中读取日期时间

[英]python : reading a datetime from a log file using regex

I have a log file which has text that looks like this. 我有一个日志文件,其文本看起来像这样。

Jul  1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7) 

There are two time formats in the file. 文件中有两种时间格式。 I need to sort this log file based on the date time format enclosed in []. 我需要根据[]中包含的日期时间格式对此日志文件进行排序。

This is the regex I am trying to use. 这是我试图使用的正则表达式。 But it does not return anything. 但它不会返回任何东西。

t_pat = re.compile(r".*\[\d+/\D+/.*\]")

I want to go over each line in file, be able to apply this pattern and sort the lines based on the date & time. 我想遍历文件中的每一行,能够应用此模式并根据日期和时间对行进行排序。

Can someone help me on this? 有人可以帮我吗? Thanks! 谢谢!

You have a space in there that needs to be added to the regular expression 你有一个空间需要添加到正则表达式

text = "Jul  1 03:27:12 syslog: [m_java][ 1/Jul/2013 03:27:12.818][j:[SessionThread <]^Iat com/avc/abc/magr/service/find.something(abc/1235/locator/abc;Ljava/lang/String;)Labc/abc/abcd/abcd;(bytecode:7)"
matches = re.findall(r"\[\s*(\d+/\D+/.*?)\]", text)
print matches
['1/Jul/2013 03:27:12.818']

Next parse the time using the following function 接下来使用以下函数解析时间

http://docs.python.org/2/library/time.html#time.strptime http://docs.python.org/2/library/time.html#time.strptime

Finally use this as a key into a dict, and the line as the value, and sort these entries based on the key. 最后使用它作为dict的键,并将行作为值,并根据键对这些条目进行排序。

You are not matching the initial space; 你不匹配初始空间; you also want to group the date for easy extraction, and limit the \\D and .* patterns to non-greedy: 您还希望将日期分组以便于提取,并将\\D.*模式限制为非贪婪:

t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")

Demo: 演示:

>>> re.compile(r".*\[\s?(\d+/\D+?/.*?)\]").search(line).group(1)
'1/Jul/2013 03:27:12.818'

You can narrow down the pattern some more; 你可以更多地缩小模式; you only need to match 3 letters for the month for example: 你只需要匹配月份的3个字母,例如:

t_pat = re.compile(r".*\[\s?(\d{1,2}/[A-Z][a-z]{2}/\d{4} \d{2}:\d{2}:[\d.]{2,})\]")

Read all the lines of the file and use the sort function and pass in a function that parses out the date and uses that as the key for sorting : 读取文件的所有行并使用sort函数并传入一个解析日期并将其用作排序键的函数:

import re
import datetime

def parse_date_from_log_line(line):
    t_pat = re.compile(r".*\[\s?(\d+/\D+?/.*?)\]")
    date_string = t_pat.search(line).group(1)
    format = '%d/%b/%Y %H:%M:%S.%f'
    return datetime.datetime.strptime(date_string, format)

log_path = 'mylog.txt'
with open(log_path) as log_file:
    lines = log_file.readlines()
    lines.sort(key=parse_date_from_log_line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM