简体   繁体   English

使用logstash解析包含python回溯的日志

[英]Parse logs containing python tracebacks using logstash

I have been trying to parse my python traceback logs using logstash. 我一直在尝试使用logstash解析我的python traceback日志。 My logs look like this: 我的日志看起来像这样:

[pid: 26422|app: 0|req: 73/73] 192.168.1.1 () {34 vars in 592 bytes} [Wed Feb 18 13:35:55 2015] GET /data => generated 2538923 bytes in 4078 msecs (HTTP/1.1 200) 2 headers in 85 bytes (1 switches on core 0)
Traceback (most recent call last):
  File "/var/www/analytics/parser.py", line 257, in parselogfile
    parselogline(basedir, lne)
  File "/var/www/analytics/parser.py", line 157, in parselogline
    pval = understandpost(parts[3])
  File "/var/www/analytics/parser.py", line 98, in understandpost
    val = json.loads(dct["events"])
  File "/usr/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Unterminated string starting at: line 1 column 355 (char 354)

So far I've been able to parse the logs except the last line ie 到目前为止,我已经能够解析除最后一行之外的日志了

ValueError: Unterminated string starting at: line 1 column 355 (char 354)

I'm using the multiline filter to do so. 我正在使用多行过滤器这样做。 My logstash configuration looks something like this: 我的logstash配置如下所示:

filter {

    multiline {
        pattern => "^Traceback"
        what => "previous"
    }

    multiline {
        pattern => "^ "
        what => "previous"
    }


    grok {
        match => [
            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
            ]
    }

    if "_grokparsefailure" in [tags] {
        multiline {
            pattern => "^.*$"
            what => "previous"
            negate => "true"
        }
    }

    if "_grokparsefailure" in [tags] {
        grok {
            match => [
                  "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
        ]
            remove_tag => ["_grokparsefailure"]
        }
    }
}

But my last line isn't parsing. 但我的最后一行不是解析。 Instead it is still giving me an error and has also increased the processing time exponentially. 相反,它仍然给我一个错误,并且还以指数方式增加了处理时间。 Any advice on how to parse the last line of the traceback? 关于如何解析追溯的最后一行的任何建议?

Well, I found a solution. 好吧,我找到了解决方案。 So the approach I followed is that I will ignore the starting of a log message which starts with '['and all the other lines will be appended at the end of the previous message. 所以我遵循的方法是忽略以“[”开头的日志消息的开始,并且所有其他行将附加在前一条消息的末尾。 Then grok filter can be applied and the traceback can be parsed. 然后可以应用grok过滤器并且可以解析回溯。 Note that I have to apply two grok filters: 请注意,我必须应用两个grok过滤器:

  1. For when there is a traceback with GREEDYDATA to get the traceback. 当有一个带有GREEDYDATA的回溯来获取回溯时。

  2. For when there is no traceback, GREEDYDATA parsing fails and I'll have to remove the _grokparsefailure tag and then again apply grok without GREEDYDATA. 因为当没有回溯时,GREEDYDATA解析失败,我将不得不删除_grokparsefailure标签,然后再次应用没有GREEDYDATA的grok。 This is done with the help of if block. 这是在if块的帮助下完成的。

The final logstash filter looks something like this: 最终的logstash过滤器看起来像这样:

filter {

    multiline {
        pattern => "^[^\[]"
        what => "previous"
    }



    grok {
        match => [
            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
        ]
    }

    if "_grokparsefailure" in [tags] {
        grok {
            match => [
            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)"
                ]
            remove_tag => ["_grokparsefailure"]
        }
    }

    else {
        mutate {
            convert => {"traceback" => "string"}
        }
    }

    date {
        match => ["timestamp", "dd/MM/YYYY:HH:MM:ss Z"]
        locale => en
    }
    geoip {
        source => "clientip"
    }
    useragent {
        source => "agent"
        target => "Useragent"
    }
}

Alternatively, if you don't want to use the if block to check another grok pattern and remove the _grokparsefailure , you can use the first grok filter to check for both the message types by including multiple message-pattern checks in the match array of grok filter. 或者,如果您不想使用if块来检查另一个grok模式并删除_grokparsefailure ,则可以使用第一个grok过滤器通过在_grokparsefailurematch数组中包含多个消息模式检查来检查这两种消息类型过滤。 It can be done like this: 它可以这样做:

        grok {
            match => [
            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)",
            "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)%{GREEDYDATA:traceback}"
                ]
        }

And there is a third approach as well (possibly the most elegant one). 还有第三种方法(可能是最优雅的方法)。 It looks something like this: 它看起来像这样:

grok {
    match => [
        "message", "\[pid\: %{NUMBER:process_id:int}\|app: 0\|req: %{NUMBER}/%{NUMBER}\] %{IPORHOST:clientip} \(\) \{%{NUMBER:vars:int} vars in %{NUMBER:bytes:int} bytes\} \[%{GREEDYDATA:timestamp}\] %{WORD:method} /%{GREEDYDATA:referrer} \=\> generated %{NUMBER:generated_bytes:int} bytes in %{NUMBER} msecs \(HTTP/%{NUMBER} %{NUMBER:status_code:int}\) %{NUMBER:headers:int} headers in %{NUMBER:header_bytes:int} bytes \(%{NUMBER:switches:int} switches on core %{NUMBER:core:int}\)(%{GREEDYDATA:traceback})?"
    ]
}

Note that in this method, the field whose existence is optional has to be enclosed in "()?". 请注意,在此方法中,存在是可选的字段必须包含在“()?”中。 Here, (%{GREEDYDATA:traceback})? 在这里, (%{GREEDYDATA:traceback})?

Thus, the grok filter sees that if the field is available, it will be parsed. 因此,grok过滤器看到如果该字段可用,它将被解析。 Otherwise, it will be skipped. 否则,它将被跳过。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM