簡體   English   中英

在 LogStash 中,如何刪除任何大於特定大小的 json/xml 字段

[英]In LogStash, how remove any json/xml field larger than specific size

簡而言之,我在我們公司有這個堆棧用於我們的公司日志:

All Request/Response Log Files -> Filebeat -> Kafka -> Logstash - ElasiicSearch

很常見的做法。

然而,在意外的請求/響應格式中可能存在一個非常大的 xml/json 字段。 I want to remove only this specific field/node no matter which level either in json or xml structure since the request/response can either be SOAP (XML) or rest (json).

換句話說,我以前不知道響應/請求消息樹/結構,並且我不想基於整個大小丟棄整個消息,只丟棄大於特定大小的特定字段/節點。

例如:

2019-12-03 21:41:59.409  INFO 4055 --- [ntainer#0-0-C-1] Transaction Consumer                     : Message received successfully: {"serviceId":"insertEft_TransferPropias","sourceTransaction":"CMMO","xml":"PD94bWw some very large base 64 data ...}

我的整個 docker 組成是:

version: '3.2'
services:

  zoo1:
    image: elevy/zookeeper:latest
    environment:
      MYID: 1
      SERVERS: zoo1
    ports:
      - "2181:2181"

  kafka1:
    image: wurstmeister/kafka
    command: [start-kafka.sh]
    depends_on:
      - zoo1
    links:
      - zoo1
    ports:
      - "9092:9092"
    environment:
      KAFKA_LISTENERS: PLAINTEXT://:9092
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka1:9092
      KAFKA_BROKER_ID: 1
      KAFKA_ADVERTISED_PORT: 9092
      KAFKA_LOG_RETENTION_HOURS: "168"
      KAFKA_LOG_RETENTION_BYTES: "100000000"
      KAFKA_ZOOKEEPER_CONNECT:  zoo1:2181
      KAFKA_CREATE_TOPICS: "log:1:1"
      KAFKA_AUTO_CREATE_TOPICS_ENABLE: 'true'

  filebeat:
    image: docker.elastic.co/beats/filebeat:7.5.2
    command: filebeat -e -strict.perms=false
    volumes:
      - "//c/Users/Cast/docker_folders/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro"
      - "//c/Users/Cast/docker_folders/sample-logs:/sample-logs"
    links:
      - kafka1
    depends_on:
      - kafka1

  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.5.2
    environment:
      - cluster.name=docker-cluster
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
      - xpack.security.enabled=false
      - xpack.watcher.enabled=false
      - discovery.type=single-node
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - "//c/Users/Cast/docker_folders/esdata:/usr/share/elasticsearch/data"
    ports:
      - "9200:9200"

  kibana:
    image: docker.elastic.co/kibana/kibana:7.5.2
    volumes:
      - "//c/Users/Cast/docker_folders/kibana.yml:/usr/share/kibana/config/kibana.yml"
    restart: always
    environment:
    - SERVER_NAME=kibana.localhost
    - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    ports:
      - "5601:5601"
    links:
      - elasticsearch
    depends_on:
      - elasticsearch

  logstash:
    image: docker.elastic.co/logstash/logstash:7.5.2
    volumes:
      - "//c/Users/Cast/docker_folders/logstash.conf:/config-dir/logstash.conf"
    restart: always
    command: logstash -f /config-dir/logstash.conf
    ports:
      - "9600:9600"
      - "7777:7777"
    links:
      - elasticsearch
      - kafka1

logstash.conf

input{
  kafka{
    codec => "json"
    bootstrap_servers => "kafka1:9092"
    topics => ["app_logs","request_logs"]
    tags => ["my-app"]
  }
}

filter {    
    if [fields][topic_name] == "app_logs" {     
        grok {
            match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} *%{LOGLEVEL:level} %{DATA:pid} --- *\[%{DATA:application}] *%{DATA:class} : %{GREEDYDATA:msglog}" }
            tag_on_failure => ["not_date_line"]
        }           
        date {
            match => ["timestamp", "ISO8601"]
            target => "timestamp"
        }   
        if "_grokparsefailure" in [tags] {
            mutate {
                add_field => { "level" => "UNKNOWN" }
            }
        }       
    } 
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "%{[fields][topic_name]}-%{+YYYY.MM.dd}"
  }
}

想象的解決方案

...
        grok {
            match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} *%{LOGLEVEL:level} %{DATA:pid} --- *\[%{DATA:application}] *%{DATA:class} : %{GREEDYDATA:msglog}" }
            tag_on_failure => ["not_date_line"]
        }
...
        if "_grokparsefailure" in [tags] {
            filter {
              mutate { remove_field => [ "field1", "field2", "field3", ... "fieldN" dinamically discovered based on size ] }
            }
        }

*** 已編輯

我不確定這種方法有多好,主要是因為在我看來,我將強制 Logstash 充當一個塊階段,將所有 json 傳遞到 memory 並在保存到 Elastic 之前對其進行解析。 順便說一句,尚未在壓力情況下進行測試,我的一位同事提出了這種替代方案

input...
filter {
 if "JAVALOG" in [tags] {   
    grok {
            match => { "message" => "%{TIMESTAMP_ISO8601:timestamp} %{WORD:severity} (?<thread>\[.*]) (?<obj>.*)" }
        }

        json {
            source => "obj"
            target => "data"
            skip_on_invalid_json => true
        }
    json {
        source => "[data][entity]"
            target => "request"
            skip_on_invalid_json => true
    }
        mutate{ remove_field => [ "message" ]}
mutate{ remove_field => [ "obj" ]}
    mutate { lowercase => [ "[tags][0]" ]  }
    mutate { lowercase => [ "meta_path" ]  }
    ruby {
        code => '
        request_msg = JSON.parse(event.get("[data][entity]"))
                request_msg.to_hash.each do |key, value|        
            logger.info("field is: #{key}")
                        if value.to_s.length > 10
                                logger.info("field length is greater than 10!")
                request_msg.delete("#{key}")
                event.set("[data][entity]", request_msg.to_s)
                        end
                end
                '
    }
    mutate { remove_field => ["request"] }
json {
            source => "data"
            target => "data_1"
            skip_on_invalid_json => true
        }
}
}
output ...

您是否查看過使用 logstash 模板上可用的設置?

下面是一個例子:

PUT my_index
{
  "mappings": {
    "properties": {
      "message": {
        "type": "keyword",
        "ignore_above": 20 
      }
    }
  }
}

資料來源: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-above.html

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM