简体   繁体   中英

How to filter JSON data from a log4j file using logstash?

I have a log file such as the following.

2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.
2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}

You will notice that the last line consists of some JSON data I'm trying to configure my logstash to extract this JSON data The following is my logstash config file:

input {  
  file {
    path => "C:/Users/TESTER/Desktop/files/test1.log" 
    type => "test"
        start_position => "beginning" 
  }
}


filter {  
  grok {
    match => [ "message" , "timestamp : %{DATESTAMP:timestamp}", "severity: %{WORD:severity}", "clazz: %{JAVACLASS:clazz}", "selco: %{NOTSPACE:selco}", "testerField: (?<ENQDTLS>EnquiryDetails :)"]

       }
}


output {
    elasticsearch {
        hosts => "localhost"
        index => "test1"
    }
    stdout {}
}

However this is the my logstash output:

C:\logstash-2.0.0\bin>logstash -f test1.conf
io/console not supported; tty will not be manipulated
Default settings used: Filter workers: 2
Logstash startup completed
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,607 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-24 09:44:26,609 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-8] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,399 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 09:55:28,401 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-9] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,135 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2014-12-26 11:10:32,136 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-10] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-24 09:41:29,383 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-4] CSRFToken set successfully.
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,500 INFO c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] in getCSRFToken
2016-01-08T08:02:02.029Z TW 2015-11-30 16:21:09,145 INFO c.t.t.s.a.i.AnalyticsServiceImpl.captureHit [http-bio-8080-exec-9] EnquiryDetails : {"createdTime":1448880669029,"modifiedTime":null,"active":true,"deleted":false,"deletedOn":-1,"guid":null,"uuid":null,"id":130771,"instanceId":130665,"pos":"","channel":"Web","flightNo":"TWBL2DL2","orig":"BLR","dest":"DEL","cabCls":"ECONOMY","logCls":"Y","noOfPaxs":1,"scheduleEntryId":130661,"travelDateTime":[2015,12,1,21,30],"enquiryDateTime":[2015,11,30,16,21,9,23000000]}
2016-01-08T08:02:02.029Z TW 2014-12-26 11:12:40,501 DEBUG c.t.t.a.c.LoginController.getCSRFToken [http-bio-8080-exec-7] CSRFToken set successfully.

Could someone Please tell me what i am doing wrong here. Thanks

You don't say what you're experiencing that's "wrong", but let's assuming that you're concerned about the lack of fields in your output.

First, use the rubydebug or json codec in your stdout{} output stanza. It will show you more details.

Second, it looks like your grok{} is all screwed up. grok{} takes an input field and one or more regular expressions to apply against the input. You're giving it the input ("message"), but this regexp:

 "timestamp : %{DATESTAMP:timestamp}"

doesn't match your input since you have no literal string "timestamp : ".

You need something more like:

 "%{DATESTAMP} %{WORD:severity}" (etc)

I would recommend setting up one grok{} stanza to pull all the common info off (everything up to the ]). Then, use another to deal with the different types of messages.

I found a solution to my problem.

input {  
  file {
    path => "C:/Users/TESTER/Desktop/elk Files 8-1-2015/test1.log" 
        start_position => "beginning" 
  }
}


filter {  
  grok {

     match => {"message" => "%{DATESTAMP:timestamp} %{WORD:severity} %{JAVACLASS:clazz} %{NOTSPACE:selco} (?<ENQDTLS>EnquiryDetails :) (?<JSONDATA>.*)"}

     add_tag => [ "ENQDTLS"]


 }

  if "ENQDTLS" not in [tags] {            
    drop { }
  }

  mutate {
    remove_tag => ["ENQDTLS"]
  }

  json {
        source => "JSONDATA"
    }

    mutate {
    remove_field => ["timestamp"]
    remove_field => ["clazz"]
    remove_field => ["selco"]
    remove_field => ["severity"]
    remove_field => ["ENQDTLS"]
    remove_field => ["JSONDATA"]
  }

}


output {
    elasticsearch {
        hosts => "localhost"
        index => "test3"
    }
    stdout {
    codec => rubydebug
    }
}

So what Im doing here is filtering out any line that does not contain the keyword "EnquiryDetails" using GROK, then I am processing the JSON data in that line. I hope this helps anyone else who might have the same issue. Also since I'm new to this. would like to know if this is a good approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM