What is the best way to parse a log file into a python list?

Question

I have an interesting log file format and I would like to parse it into Python for analysis.

The format is key=value with each separated by tabs, and a newline at the end of each entry, like this:

date="Mon, 04 Jul 2011 05:05:45 GMT"    addr=127.0.0.1  response_time=13    method=GET  url=/   status=200  referrer=   user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30"

Now some of the fields may be changing, so I want the code to be flexible with what is thrown at it, as long as it is in the form of key=value key=value (etc.)

As of now, I have two embedded for loops, one to split each line into the key=value fields, and the other to split the key=value into their own separate entities.

Does this seem like the best way to go or is there a more elegant solution?

Answer 1

Two for loops seems fine for this problem. If I were coding it, I'd probably do something like this:

with open('log_file') as f:
    for line in f:
        fields = line.split('\t')
        for field in fields:
            key,_,val = field.partition('=')
            # Do something with each key and val

Answer 2

There probably is a module to parse log files, but a simple homemade way to this easily is to use the shlex module:

>>> import shlex
>>> line = """date="Mon, 04 Jul 2011 05:05:45 GMT" addr=127.0.0.1 response_time=13 method=GET url=/ status=200 referrer= user_agent="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30" """
# Split key=value pairs.  'value' can contain quoted whitespace.
>>> keyvals = shlex.split(line)
# Separate 'key' and 'gals' for each keyval pair.
>>> data = [x.partition('=')[::2] for x in keyvals]
>>> print data
[('date', 'Mon, 04 Jul 2011 05:05:45 GMT'), ('addr', '127.0.0.1'), ('response_time', '13'), ('method', 'GET'), ('url', '/'), ('status', '200'), ('referrer', ''), ('user_agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30')]
>>> print dict(data)
{'status': '200', 'addr': '127.0.0.1', 'url': '/', 'referrer': '', 'user_agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/534.30 (KHTML, like Gecko) Chrome/12.0.742.112 Safari/534.30', 'date': 'Mon, 04 Jul 2011 05:05:45 GMT', 'method': 'GET', 'response_time': '13'}

Answer 3

The csv module for python would work great here. You can set the delimiter to a tab This example is from the docs using a space.

>>> import csv
>>> spamReader = csv.reader(open('eggs.csv', 'rb'), delimiter=' ', quotechar='|')
>>> for row in spamReader:
...     print ', '.join(row)

Then you could, inside the loop check for the '=' character and split the string if found into key value pairs.

result = []    
for row in spamReader:
       if '=' in row:
          s = row.split('=')
          result.append( {s[0]:s[1]} )

What is the best way to parse a log file into a python list?

Question

3 answers

solution1
2 ACCPTED 2011-07-27 01:51:59

solution2
1 2011-07-27 01:53:53

solution3
1 2011-07-27 01:57:35

What is the best way to parse a log file into a python list?

Question

3 answers

solution1 2 ACCPTED 2011-07-27 01:51:59

solution2 1 2011-07-27 01:53:53

solution3 1 2011-07-27 01:57:35

solution1
2 ACCPTED 2011-07-27 01:51:59

solution2
1 2011-07-27 01:53:53

solution3
1 2011-07-27 01:57:35