简体   繁体   中英

ElasticSearch: How to move a field to a different level with existing data?

Say I have:

PUT /test/_doc/1
{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch",
    "data": {
        "modified_date": "2018-11-15T14:12:12",
        "password": "abcpassword"
    }
}

Then I get the following mapping:

GET /test/_mapping/_doc
{
    "test": {
        "mappings": {
            "_doc": {
                "properties": {
                    "data": {
                        "properties": {
                            "modfied_date": {
                                "type": "date"
                            },
                            "password": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    }
                                }
                            }
                        }
                    },
                    "message": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    },
                    "post_date": {
                        "type": "date"
                    },
                    "user": {
                        "type": "text",
                        "fields": {
                            "keyword": {
                                "type": "keyword",
                                "ignore_above": 256
                            }
                        }
                    }
                }
            }
        }
    }
}

How can I reindex the mapping to bring modified_date to the same level as user and not lose any data?

{
    "user" : "kimchy",
    "post_date" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch",
    "modified_date": "2018-11-15T14:12:12"
    "data": {
        "password": "abcpassword"
    }
}

I'd suggest using Ingest Node and Pipelines . You can read about them in the links added respectively.

Basically what I will do is, construct a pipeline and mention it during indexing or reindexing process so that your document would go through the pre-processing as defined in the pipeline before document is actually stored in the destination index.

I've created below pipeline for your use case. What it does is, adds a new field modified_date with value as required and removed field data.modified_date . If any fields are not mentioned in it, it would not be modified and would be ingested in destination index as is.

Create/Add Pipeline

PUT _ingest/pipeline/mydatepipeline
{
  "description" : "modified date pipeline",
  "processors" : [
    {
      "set" : {
        "field": "modified_date",
        "value": "{{data.modified_date}}"
      }
    },
    {
      "remove": {
        "field": "data.modified_date"
      }
    }
  ]
}

Once above pipeline is created, make use of it to perform reindexing.

Usage 1: During Reindexing to New Index

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test_dest",
    "pipeline": "mydatepipeline"
  }
}

The documents would be transformed as what you expect it to be and would be indexed in test_dest index. Note that you need to explicitly create the test_dest with the mapping details as per your requirement.

Usage 2: Using pipeline during bulk operations before indexing

You can use it during bulk operation as follows:

POST _bulk?pipeline=mydatepipeline

Usage 3: Using the pipeline on individual docs during indexing

PUT test/_doc/1?pipeline=mydatepipeline
{
  "user" : "kimchy",
  "post_date" : "2009-11-15T14:12:12",
  "message" : "trying out Elasticsearch",
  "data": {
      "modified_date": "2018-11-15T14:12:12",
      "password": "abcpassword"
  }
}

For both Usage 2 and 3 , you need to ensure your mapping is created accordingly.

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM