简体   繁体   中英

Logstash json parse

I'm new regarding Logstash, currently I'm trying to read files from S3 (every new line of file is a new json) and parse JSON fields to send only part to ES.

It's amazing how Logstash is supporting me with this, as until now everything was smooth:

input { s3 { ... } }

I didn't even need to explicitly say that files are GZiped, or that codec is JSON, which still surprises me, how Logstash is resolving that.

But...now If I give immediately:

output { elasticsearch { ... } }

then all my JSON body lands in a "message" string inside ElasticSearch. So I did this:

filter { json { source => "message" } }

After that I see that every child from my JSON is parsed as separated value in ES - this is perfect, but what if I want send to ES only 2, or 3 children from the JSON?

My example structure in JSON:

{"path":"/h/asia","headers":{"x-forwarded-for":"1.1.1.1","user-agent":"Mozilla/5.0"},"params":{"filters_values":"test","pagecount":"2","user_status":"unlogged"},"meta":{"date":1538974058,"acceptCookies":true}}

So at the end I'm landing in ES with fields like:

"path.headers.x-forwarded-for", 
"params.pagecount", 
"params.user_status" etc.

Where my aim is to store in ES only two like "params.filters_values" and "headers.user_agent".

Thanks in advance

You can use the prune filter to pick the fields you want:

filter {
  prune {
    whitelist_names => [ "params", "headers" ]
  }
}

However, this has the limitation that you can only do it on top level fields so not quite what you want.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-prune.html

Use the remove_field in json filter

filter {
  json {
    source => "message"
    remove_field => [ "path.headers.x-forwarded-for", "params.pagecount", .. ]
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM