請求日志解析器-文本解析

Question

我必須解析具有以下結構的請求日志

07/Dec/2017:18:15:58 +0100 [293920] -> GET URL HTTP/1.1
07/Dec/2017:18:15:58 +0100 [293920] <- 200 text/html 5ms
07/Dec/2017:18:15:58 +0100 [293921] -> GET URL HTTP/1.1
07/Dec/2017:18:15:58 +0100 [293921] <- 200 image/png 39ms
07/Dec/2017:18:15:59 +0100 [293922] -> HEAD URL HTTP/1.0
07/Dec/2017:18:15:59 +0100 [293922] <- 401 - 1ms
07/Dec/2017:18:15:59 +0100 [293923] -> GET URL HTTP/1.1
07/Dec/2017:18:15:59 +0100 [293923] <- 200 text/html 178ms
07/Dec/2017:18:15:59 +0100 [293924] -> GET URL HTTP/1.1
07/Dec/2017:18:15:59 +0100 [293924] <- 200 text/html 11ms
07/Dec/2017:18:15:59 +0100 [293925] -> GET URL HTTP/1.1
07/Dec/2017:18:15:59 +0100 [293925] <- 200 text/html 7ms
07/Dec/2017:18:15:59 +0100 [293926] -> GET URL HTTP/1.1
07/Dec/2017:18:15:59 +0100 [293926] <- 200 text/html 16ms
07/Dec/2017:18:15:59 +0100 [293927] -> GET URL HTTP/1.1
07/Dec/2017:18:15:59 +0100 [293927] <- 200 text/html 8ms

輸出應基於方括號之間的數字鏈接此日志中的兩行。 目的是使用其他數據處理軟件包從此日志文件中提取信息。 我想使用csv文件提取有用的信息。 csv文件的結構應如下所示。

startTimestamp,endTimestamp,requestType/responseCode,URL/typ,responsetime

07/Dec/2017:18:15:58,07/Dec/2017:18:15:58,GET,200,URL,text/html,5ms

我制作了一個能夠完成上述操作的groovyScript，但是速度非常慢。

我知道我可以做些改進，但希望您有想法。 你們中有些人過去可能已經解決了這個問題。

響應並不總是遵循請求。 並非每個請求都會得到響應（或者由於服務器重新啟動而未記錄）

日志文件的大小可以從70mb到300 mb。 我的groovyScript花了很長時間。

我知道在awk和sort的unix終端中有很好且快速的解決方案。 但是對此沒有經驗。

在此先感謝您的幫助

這是我已經有可能改進的代碼

1）使用map為鍵，數字為鍵，以加快搜索速度並減少解析

2）不要在每一行都查看積壓列表

def logFile = new File("../request.log")
def outputfile = new File(logFile.parent, logFile.name + ".csv")
def backlog = new ArrayList<String>()
StringBuilder output = new StringBuilder()


outputfile.withPrintWriter { writer ->
    logFile.withReader { Reader reader ->
        reader.eachLine { String line ->
            Iterator<String> it = backlog.iterator()
            while (it.hasNext()) {
                String bLine = it.next()
                String[] lineSplit = line.split(" ")
                if (bLine.contains(lineSplit[2])) {
                    String[] bLineSplit = bLine.split(" ")
                    output.append(bLineSplit[0] + "," + lineSplit[0] + "," + bLineSplit[4] + "," + lineSplit[4] + "," + bLineSplit[5] + "," + lineSplit[5] + "," + lineSplit[6] + "\r\n")
                    //writer.println(outputline)
                    it.remove()
                }
            }
            backlog.add(line)
        }
    }
    writer.println(output)
    if (!backlog.isEmpty()) {
    }
    backlog.each { String line ->
        writer.println(line)
    }
}

Answer 1

作為單線：

sort -k 3,3 request.log | awk 'BEGIN { print "startTimestamp;endTimestamp;requestType;responseCode;URL;typ;responsetime"; split("", request); split("", response) } $4 == "->" { printLine(); split($0, request); split("", response) } $4 == "<-" { split($0, response) } END { printLine() } function printLine() { if (length(request)) { print request[1] ";" response[1] ";" request[5] ";" response[5] ";" request[6] ";" response[6] ";" response[7] } }'

作為多線：

sort -k 3,3 request.log | awk '
    BEGIN {
        print "startTimestamp;endTimestamp;requestType;responseCode;URL;typ;responsetime"
        split("", request)
    }
    $4 == "->" {
        printLine()
        split($0, request)
        split("", response)
    }
    $4 == "<-" {
        split($0, response)
    }
    END {
        printLine()
    }
    function printLine() {
        if (length(request)) {
            print request[1] ";" response[1] ";" request[5] ";" response[5] ";" request[6] ";" response[6] ";" response[7]
        }
    }'

請求日志解析器-文本解析

問題描述

1 個解決方案

解決方案1
0 已采納 2017-12-12 11:28:16

請求日志解析器-文本解析

問題描述

1 個解決方案

解決方案1 0 已采納 2017-12-12 11:28:16

解決方案1
0 已采納 2017-12-12 11:28:16