简体   繁体   中英

Convert access.log to JSON format using python

**This is my python code, I'm trying to convert NGINX logs.

I'm reading logs from access.log file and using regular expressions to convert it into JSON format and i need to upload these logs to Elasticseach. Please also guide related to that. I'm new into both**

 import json 
 import re

 i = 0
 result = {}

with open('access.log') as f:
  lines = f.readlines()


regex = '([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) - "(.*?)" "(.*?)"'

for line in lines:

  r = re.match(regex,line)

  if len(r) >= 6:
    result[i] = {'IP address': r[0], 'Time Stamp': r[1], 'HTTP status': r[2], 'Return status': 
                 r[3], 'Browser Info': r[4]}
    i += 1
 print(result) 

with open('data.json', 'w') as fp:
 json.dump(result, fp)

I'm facing the following error

Traceback (most recent call last):
   File "/home/zain/Downloads/stack.py", line 17, in <module>
    if len(r) >= 6:
TypeError: object of type 'NoneType' has no len()

These are log format

127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET / HTTP/1.1" 200 3437 "-" "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /icons/openlogo-75.png HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
127.0.0.1 - - [23/May/2022:22:44:14 -0400] "GET /favicon.ico HTTP/1.1" 404 125 "http://localhost/" "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

Expected output is

IP Address: 127.0.0.1 Time Stamp: 23/May/2022:22:44:14  HTTP Status: "GET / HTTP/1.1" Return Status: 200 3437  Browser Info: "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"

I took my cue from this code . Believe the following should do it:

import json 
import re

i = 0
result = {}

with open('access.log') as f:
    lines = f.readlines()

regex = '(?P<ipaddress>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}) - - \[(?P<dateandtime>.*)\] \"(?P<httpstatus>(GET|POST) .+ HTTP\/1\.1)\" (?P<returnstatus>\d{3} \d+) (\".*\")(?P<browserinfo>.*)\"'

for line in lines:

    r = re.match(regex,line)
    
    if r != None:
        result[i] = {'IP address': r.group('ipaddress'), 'Time Stamp': r.group('dateandtime'), 
                     'HTTP status': r.group('httpstatus'), 'Return status': 
                     r.group('returnstatus'), 'Browser Info': r.group('browserinfo')}
        i += 1
    
print(result)

with open('data.json', 'w') as fp:
    json.dump(result, fp) 

Result ( print(json.dumps(result, sort_keys=False, indent=4)) ):

{
    "0": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET / HTTP/1.1",
        "Return status": "200 3437",
        "Browser Info": "Mozilla/5.0   (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "1": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /icons/openlogo-75.png HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    },
    "2": {
        "IP address": "127.0.0.1",
        "Time Stamp": "23/May/2022:22:44:14 -0400",
        "HTTP status": "GET /favicon.ico HTTP/1.1",
        "Return status": "404 125",
        "Browser Info": "Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0"
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM