简体   繁体   English

通过Python解析Apache日志文件

[英]Apache log file parsing via Python

I am making a python log parser script where I need to count the number of logs whose status code is 200 from a log file. 我正在制作一个python日志解析器脚本,在该脚本中,我需要从日志文件计算状态代码为200的日志的数量。

Here are some of the logs from the file: 以下是文件中的一些日志:

120.115.144.240 - - [29/Aug/2017:04:40:03 -0400] "GET /apng/assembler-2.0/assembler2.php HTTP/1.1" 404 231 "http://littlesvr.ca/apng/history.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36"

202.167.250.99 - - [29/Aug/2017:04:41:10 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 200 115656 "http://bbs.mydigit.cn/read.php?tid=2186780&fpage=3" "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36"

14.152.69.236 - - [29/Aug/2017:04:41:41 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 304 - "http://bbs.mydigit.cn/read.php?tid=2205351" "Mozilla/5.0 (Linux; U; Android 7.1.2; zh-CN; NX510J Build/NJH47D) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/40.0.2214.89 UCBrowser/11.6.6.951 Mobile Safari/537.36"

60.4.236.27 - - [29/Aug/2017:04:42:46 -0400] "GET /apng/images/o_sample.png?1424751982?1424776117 HTTP/1.1" 200 115656 "http://bbs.mydigit.cn/read.php?tid=1952896" "Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"

58.62.17.190 - - [29/Aug/2017:04:50:01 -0400] "GET /apng/gif_apng_webp1.html HTTP/1.1" 200 935 "http://dev.qq.com/topic/582939577ef9c5b708556b0d" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36"

I have tried this code but the only output I'm getting is a long list of closed brackets [] : 我已经试过了这段代码,但是我得到的唯一输出是一长串的括号[]

#!/usr/bin/env python3

import sys
import re

f = open('accesslogfile', 'r')
print('Reading log files... done.')
nooflines = f.readlines()

for line in nooflines:    
    regex = re.match(r'\d{200}\s', line)
    print(regex)
f.close()

In this case, I know the output should be 3 (As there are only three logs that have the status code 200) but I can't seem to get it. 在这种情况下,我知道输出应该为3 (因为只有三个日志的状态码为200),但是我似乎无法得到它。 Any help would be appreciated. 任何帮助,将不胜感激。

Thanks :) 谢谢 :)

Just change your regex to (200)\\s . 只需将您的正则表达式更改为(200)\\s What you are doing is matching 200 of any digit and then one character of white space (like a line break of a space or a tab). 您正在做的是匹配任意数字200,然后匹配一个空格字符(例如空格或制表符的换行符)。 What you want is to match the token "200 ". 您要匹配令牌“ 200”。 So just put (200)\\s as your regex. 因此,只需将(200)\\s作为您的正则表达式。

You are doing following things wrong here. 您在这里执行错误操作。

  1. Using match instead of search. 使用匹配而不是搜索。 See difference here 在这里看到差异
  2. Using {200} instead of {3} 使用{200}代替{3}
  3. And not adding \\s in the regex 而不是在正则表达式中添加\\ s

So your regex should be 所以你的正则表达式应该是

re.search(r'\s\d{3}\s', line)

So you have the following code: 因此,您具有以下代码:

import re
counter = 0
for line in log.split('\n'):
    if line:
        regex = re.search(r'\s\d{3}\s', line)
        if regex.group().strip() == '200':
            counter += 1
print('Found ', counter)

Output: 输出:

('Found ', 3) ('找到',3)

import pandas


df = pandas.read_csv("log_path", sep='\s+', names=[i for i in range(10)])

print(df.loc[df[6] == 200])
print(len(df.loc[df[6] == 200]))

很简单:

re.findall('(HTTP/1.1\" 200)',line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM