简体   繁体   中英

Elastic search common mapping type and run aggregation based on type of data

we have an elastic search index with following mapping (showing only partial mapping relevant to this question)

"instFields": {
            "properties": {
              "_index": {
                "type": "object"
              },
              "fieldValue": {
                "fields": {
                  "raw": {
                    "index": "not_analyzed",
                    "type": "string"
                  }
                },
                "type": "string"
              },
              "sourceFieldId": {
                "type": "integer"
              }
            },
            "type": "nested"
          }

as you can see fieldValue type is string : in original data in the database for that fieldValue column is stored in a JSON type column (in Postgresql). use case is such that when this data is stored fieldValue can be valid JsNumber , JsString , JsBoolean (any valid [JsValue][1] now question is that when storing this fieldValue in ES - it'll have to be a definite type - so we convert fieldValue to String while pushing data into ElasticSearch.

Following is a sample data from Elastic search

"instFields": [
        {
          "sourceFieldId": 1233,
          "fieldValue": "Demo Logistics LLC"
        },
        {
          "sourceFieldId": 1236,
          "fieldValue": "169451"
        }
      ]

this is where it gets interesting where now we want to run various metrics aggregations on fieldValue - for eg if sourceFieldId = 1236 then run [avg][3] on fieldValue - problem is fieldValue had to be stored as string in ES - due to originally fieldValue being JsValue type field in the application. what's the best way to create mapping in elastic search such that fieldValue can be stored with an appropriate type vs string type so various metrics aggregation can be run of fieldValue which are of type long (though encoded as string in ES)

One of the ways to achieve this is create different fields in elastic search with all possible type of JsValue (eg JsNumber , JsBoolean , JsString etc). now while indexing - application can derive proper type of JsValue field to find out whether it's JsString , JsNumber , JsBoolean etc.

on application side I can decode proper type of fieldValue being indexed

value match{
      case JsString(s) => 
      case JsNumber(n) => 
      case JsBoolean(b)
}

now modify mapping in elastic search and add more fields - each with proper type - as shown below

"instFields": {
                                    "properties": {
                                        "_index": {
                                            "type": "object"
                                        }, 
                                        "fieldBoolean": {
                                            "type": "boolean"
                                        }, 
                                        "fieldDate": {
                                            "fields": {
                                                "raw": {
                                                    "format": "dateOptionalTime", 
                                                    "type": "date"
                                                }
                                            }, 
                                            "format": "dateOptionalTime", 
                                            "type": "date"
                                        }, 
                                        "fieldDouble": {
                                            "fields": {
                                                "raw": {
                                                    "type": "double"
                                                }
                                            }, 
                                            "type": "double"
                                        }, 
                                        "fieldLong": {
                                            "fields": {
                                                "raw": {
                                                    "type": "long"
                                                }
                                            }, 
                                            "type": "long"
                                        }, 
                                        "fieldString": {
                                            "fields": {
                                                "raw": {
                                                    "index": "not_analyzed", 
                                                    "type": "string"
                                                }
                                            }, 
                                            "type": "string"
                                        }, 
                                        "fieldValue": {
                                            "fields": {
                                                "raw": {
                                                    "index": "not_analyzed", 
                                                    "type": "string"
                                                }
                                            }, 
                                            "type": "string"
                                        }

now at the time of indexing

value match{
      case JsString(s) => //populate fieldString
      case JsNumber(n) => //populate fieldDouble (there is also fieldLong)
      case JsBoolean(b) //populate fieldBoolean
}

this way now boolean value is stored in fieldBoolean , number is stored in long etc. now running metrics aggregation becomes a normal business by going against fieldLong or fieldDouble field (depending on the query use case). notice fieldValue field is still there in ES mapping and index as before. Application will continue to convert value to string and store it in fieldValue as before - this way queries which don't care about types can only query fieldValue field in the index.

It sounds like you should have two separate fields, one for the case when the value is a string and one for when it is an instance of a number.

Depending on how you're indexing this data, it can be easy or hard. However, its a bit strange that you have a fields that could be a string or a number.

Regardless, elasticsearch is not going to be able to do both in a single field

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM