简体   繁体   English

Logstash - 将嵌套的 JSON 导入 Elasticsearch

[英]Logstash - import nested JSON into Elasticsearch

I have a large amount (~40000) of nested JSON objects I want to insert into elasticsearch an index.我有大量(~40000)嵌套的​​ JSON 对象,我想将索引插入到 elasticsearch 中。

The JSON objects are structured like this: JSON 对象的结构如下:

    {
    "customerid": "10932"
    "date": "16.08.2006",
    "bez": "xyz",
    "birthdate": "21.05.1990",
    "clientid": "2",
    "address": [
        {
            "addressid": "1",
            "tile": "Mr",
            "street": "main str",
            "valid_to": "21.05.1990",
            "valid_from": "21.05.1990",
        },
        {
            "addressid": "2",
            "title": "Mr",
            "street": "melrose place",
            "valid_to": "21.05.1990",
            "valid_from": "21.05.1990",
        }
      ]
    }

So a JSON field (address in this example) can have an array of JSON objects.所以一个 JSON 字段(本例中的地址)可以有一个 JSON 对象数组。

What would a logstash config look like to import JSON files/objects like this into elasticsearch?将这样的 JSON 文件/对象导入 elasticsearch 时,logstash 配置会是什么样的? The elasticsearch mapping for this index should just look like the structure of the JSON.此索引的 elasticsearch 映射应该看起来像 JSON 的结构。 The elasticsearch document id should be set to customerid . elasticsearch 文档 id 应设置为customerid

input {
  stdin {
    id => "JSON_TEST"
  } 
}
filter {
    json{
        source => "customerid"
        ....
        ....    
    }

}
output {
       stdout{}
       elasticsearch {
          hosts => "https://localhost:9200/"
          index => "customers"           
          document_id => "%{customerid}"
       }                                               
}

If you have control of what's being generated, the easiest thing to do is to format you input as single line json and then use the json_lines codec.如果您可以控制生成的内容,最简单的方法是将您的输入格式化为单行 json,然后使用json_lines编解码器。

Just change your stdin to:只需将您的stdin更改为:

stdin { codec => "json_lines" }

and then it'll just work:然后它就会起作用:

cat input_file.json | logstash -f json_input.conf

where input_file.json has lines like:其中 input_file.json 有如下几行:

{"customerid":1,"nested": {"json":"here"}}
{"customerid":2,"nested": {"json":"there"}}

and then you won't need the json filter.然后你就不需要json过滤器了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM