简体   繁体   中英

Logstash json field removal

We have a heavily nested json document containing server metrcs, the document contains > 1000 fields some of which are completely irrelevant to us for analytic purposes so i would like to remove them before indexing the document in Elastic. However i am unable to find the correct filter to use as the fields i want to remove have common names in multiple different objects within the document.

The source document looks like this ( reduced in size for brevity)

[
    {
        "server": {
            "is_master": true,
            "name": "MYServer",
            "id": 2111
        },
        "metrics": {
            "Server": {
                "time": {
                    "boundary": {},
                    "type": "TEXT",
                    "display_name": "Time",
                    "value": "2018-11-01 14:57:52"
                }
             },
            "Mem_OldGen": {
                "used": {
                    "boundary": {},
                    "display_name": "Used(mb)",
                    "value": 687
                },
                "committed": {
                    "boundary": {},
                    "display_name": "Committed(mb)",
                    "value": 7116
                }
                "cpu_count": {
                    "boundary": {},
                    "display_name": "Cores",
                    "value": 4
                }
            }
         }
      }
]

The data is loaded into logstash using the http_poller input plugin and needs to be processed before sending to Elastic for indexing. I am trying to remove the fields that are not relevant for us to track for analytical purposes, these include the "display_name" and "boundary" fields from each json object in the different metrics.

I have tried using the mutate filter to remove the fields but because they exist in so many different objects it requires to many coded paths to be added to the logstash config. I have also looked at the ruby filter, which seems promising as it can look the event, but i am unable to get it to crawl the entire json document, or more importantly actually remove the fields.

Here is what i was trying as a test

filter {
      split{
    field => "message"
  }
    ruby {
        code => '
            event.get("[metrics][Mem_OldGen][used]").to_hash.keys.each { |k|
                logger.info("field is:", k)

                if k.include?("display_name")
                    event.remove(k)
                end
                if k.include?("boundary")
                    event.remove(k) 
                end
            }
        '
  }

}

It first splits the input at the message level to create one event per server, then tries to remove the fields from a specific metric.

Any help you be greatly appreciated.

If I get the point, you want to keep just the value key. So, considering the response hash:

response = {
        "server": {
            "is_master": true,
            "name": "MYServer",
            "id": 2111
        },
        "metrics": {
...

You could do:

response[:metrics].transform_values { |hh| hh.transform_values { |h| h.delete_if { |k,v| k != :value } } }

#=> {:server=>{:is_master=>true, :name=>"MYServer", :id=>2111}, :metrics=>{:Server=>{:time=>{:value=>"2018-11-01 14:57:52"}}, :Mem_OldGen=>{:used=>{:value=>687}, :committed=>{:value=>7116}, :cpu_count=>{:value=>4}}}}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM