[英]What is the best way to parse a log file into a python list?
I have an interesting log file format and I would like to parse it into Python for analysis. 我有一个有趣的日志文件格式,我想将其解析为Python进行分析。
The format is key=value
with each separated by tabs, and a newline at the end of each entry, like this: 格式为
key=value
,每个key=value
之间用制表符分隔,并在每个条目的末尾添加换行符,如下所示:
date="Mon, 04 Jul 2011 05:05:45 GMT" addr=127.0.0.1 response_time=13 method=GET url=/ status=200 referrer= user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30"
Now some of the fields may be changing, so I want the code to be flexible with what is thrown at it, as long as it is in the form of key=value key=value
(etc.) 现在某些字段可能正在更改,所以我希望代码能够灵活处理抛出的内容,只要它采用
key=value key=value
等形式即可。
As of now, I have two embedded for
loops, one to split each line into the key=value
fields, and the other to split the key=value
into their own separate entities. 到目前为止,我有两个嵌入式的
for
循环,一个用于将每一行拆分为key=value
字段,另一个用于将key=value
拆分为自己的单独实体。
Does this seem like the best way to go or is there a more elegant solution? 这看起来是最好的方法还是有更优雅的解决方案?
Two for
loops seems fine for this problem. 对于此问题,两个
for
循环似乎很好。 If I were coding it, I'd probably do something like this: 如果要编写代码,则可能会执行以下操作:
with open('log_file') as f:
for line in f:
fields = line.split('\t')
for field in fields:
key,_,val = field.partition('=')
# Do something with each key and val
There probably is a module to parse log files, but a simple homemade way to this easily is to use the shlex module: 可能有一个模块可以解析日志文件,但是一个简单的自制方法很容易使用shlex模块:
>>> import shlex
>>> line = """date="Mon, 04 Jul 2011 05:05:45 GMT" addr=127.0.0.1 response_time=13 method=GET url=/ status=200 referrer= user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30" """
# Split key=value pairs. 'value' can contain quoted whitespace.
>>> keyvals = shlex.split(line)
# Separate 'key' and 'gals' for each keyval pair.
>>> data = [x.partition('=')[::2] for x in keyvals]
>>> print data
[('date', 'Mon, 04 Jul 2011 05:05:45 GMT'), ('addr', '127.0.0.1'), ('response_time', '13'), ('method', 'GET'), ('url', '/'), ('status', '200'), ('referrer', ''), ('user_agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30')]
>>> print dict(data)
{'status': '200', 'addr': '127.0.0.1', 'url': '/', 'referrer': '', 'user_agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30', 'date': 'Mon, 04 Jul 2011 05:05:45 GMT', 'method': 'GET', 'response_time': '13'}
The csv module for python would work great here. python的csv模块在这里可以很好地工作。 You can set the delimiter to a tab This example is from the docs using a space.
您可以将定界符设置为选项卡。本示例来自文档,使用空格。
>>> import csv
>>> spamReader = csv.reader(open('eggs.csv', 'rb'), delimiter=' ', quotechar='|')
>>> for row in spamReader:
... print ', '.join(row)
Then you could, inside the loop check for the '=' character and split the string if found into key value pairs. 然后,您可以在循环内检查'='字符,并将找到的字符串拆分为键值对。
result = []
for row in spamReader:
if '=' in row:
s = row.split('=')
result.append( {s[0]:s[1]} )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.