简体   繁体   中英

Elasticsearch mapping of nested structure

I'm looking for some pointers on mapping a somewhat dynamic structure for consumption by Elasticsearch.

The raw structure itself is json, but the problem is that a portion of the structure contains a variable, rather than the outer elements of the structure being static.

To provide a somewhat redacted example, my json looks like this:

"stat": {
    "state": "valid",
    "duration": 5,
},
"12345-abc": {
    "content_length": 5,
    "version": 2
}
"54321-xyz": {
    "content_length": 2,
    "version", 1
}

The first block is easy; Elasticsearch does a great job of mapping the "stat" portion of the structure, and if I were to dump a lot of that data into an index it would work as expected. The problem is that the next 2 blocks are essentially the same thing, but the raw json is formatted in such a way that a unique element has crept into the structure, and Elasticsearch wants to map that by default, generating a map that looks like this:

"stat": {
    "properties": {
        "state": {
            "type": "string"
        },
        "duration": {
            "type": "double"
        }
    }
},
"12345-abc": {
    "properties": {
        "content_length": {
            "type": "double"
        },
        "version": {
            "type": "double"
        }
    }
},
"54321-xyz": {
    "properties": {
        "content_length": {
            "type": "double"
        },
        "version": {
            "type": "double"
        }
    }
}

I'd like the ability to index all of the "content_length" data, but it's getting separated, and with some of the variable names being used, when I drop the data into Kibana I wind up with really long fieldnames that become next to useless.

Is it possible to provide a generic tag to the structure? Or is this more trivially addressed at the json generation phase, with our developers hard coding a generic structure name and adding an identifier field name.

Any insight / help greatly appreciated.

Thanks!

If those keys like 12345-abc are generated and possibly infinite values, it will get hard (if not impossible) to do some useful queries or aggregations. It's not really clear which exact use case you have for analyzing your data, but you should probably have a look at nested objects ( https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html ) and generate your input json accordingly to what you want to query for. It seems that you will have better aggregation results if you put these additional objects into an array with a special field containing what is currently your key.

{
  "stat": ...,
  "things": [
    {
      "thingkey": "12345-abc",
      "content_length": 5,
      "version": 2
    }, 
    ...
  ]
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM