简体   繁体   中英

logstash remove nested fields using ruby filtering

i have an index with a lot of spam fields (over 300). they are all nested and look like this:

kv.amp-1-234
kv.amp-1-abc
kv.amp-1-efg

so i wanted to do filtering and use remove_fields to get rid of them. wanted to use the prune filter but i can't - they don't support nested key removal. and i can't use filter { mutate { remove_fields

because it doesnt support regex.

i saw that the only way is through ruby filtering:

  ruby {
    code => "
    event.to_hash.keys.each { |k|
    if k.start_with?('[kv.amp-1][k]')
      event.remove(k)
    end
    }
   "
   }

but it doens't work. i just need an example of deletion of nested keys using the ruby filter(no need for regex because start_with? is good enough)

using logstash 5.4.2

the whole config file:

input {

  kafka {
    topics      => "gelfEvents"
    bootstrap_servers => "elk.sonic-dev.us-east-1:9092"
    group_id    => "gelfEventsCG-elk"
    # client_id   => ""
    tags        => [ "gelf2kafka" ]
    codec       => json { charset => "UTF-8" }
  }

}

filter {
  if "gelf2kafka" in [tags] {
    mutate {
      rename => { "short_message" => "message" }
      rename => { "host" => "[host][name]" }

    }
    ruby {
      code => "event.to_hash.keys.each do |key| next unless key[0,1] == '_'; if key == '_' then event.remove(key); next; end; event.set(key[1..-1], event.remove(key)) end"
    }

#trying stuff
#   ruby {
#    code => "
#    event.to_hash.keys.each { |k|
#    if k.start_with?('[tags][k]')
#      event.remove(k)
#    end
#    }
#   "
#   }


    date {
      match => [ "timestamp", "UNIX" ]
      remove_field => [ "timestamp" ]
      # target => "timestamp_new"
    }

    translate {
      field       => "level"
      destination => "level"
      exact       => true
      regex       => false
      override    => true
      fallback    => "no match for %{level}"
      dictionary  => [ "6", "INFO",
                       "4", "WARN",
                       "3", "ERROR",
                       "7", "DEBUG" ]
    }

  }

}

output {
  if "gelf2kafka" in [tags] {
#    stdout { codec => rubydebug }

    elasticsearch {
      # protocol        => "http"
      hosts            => "localhost"
      index           => "logstash-gelf2kafka-%{+YYYY.MM.dd}"
      flush_size      => 100
      idle_flush_time => 1
    }



    statsd {
      host => "localhost"
      namespace => "logstash"
      sender => "_all" # "%{host}"
      increment => "LogsKafka2ElasticSearchAll.gelfEvents.LinesSent"
    }

  }
}

I'm going to assume the square brackets are part of the string you're trying to check for. If so your issue is that you're not interpolating k into the string.

It's gonna be a touch tricky because your code is already within a double-quoted string, but this should work:

k.start_with?(%{[kv.amp-1][#{k}]})

I'd also question whether you need the to_hash call, but that depends on exactly what kind of object event is.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM