简体   繁体   中英

Logstash: Migrating from one elastic search to another elastic search result in some additional properties

I have been migrating one of the indexes from self-hosted Elasticsearch to Amazon ElasticSearch using Logstash. After successful migration what we found was some additional fields is getting added in the documents. How can we prevent it from getting added

Our Logstash config file

input {
 elasticsearch {
 hosts => ["https://staing-example.com:443"]
 user => "userName"
 password => "password"
 index => "testingindex"
 size => 100
 scroll => "1m"
 }
}

filter {

}

output {
 amazon_es {
 hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
 region => "us-east-1"
 aws_access_key_id => "access_key_id"
 aws_secret_access_key => "access_key_id"
 index => "testingindex"
}
stdout{
  codec => rubydebug
  }
}

The document in our selfhosted ElasticSearch

{
        "_index": "testingindex",
        "_type": "interaction-3",
        "_id": "38b23e7a-eafd-4163-a9f0-e2d9ffd5d2cf",
        "_score": 1,
        "_source": {
           "customerId" : [
            "e177c1f8-1fbd-4b2e-82b8-760536e42742"
          ],
          "customProperty" : {
            "messageFrom" : [
              "BOT"
            ]
          },
          "userId" : [
            "e177c1f8-1fbd-4b2e-82b8-760536e42742"
          ],
          "uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
          "accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
        }
      }

The document that is in our Amazon ElasticSearch

   {
        "_index" : "testingindex",
        "_type" : "doc",
        "_id" : "B-hP020Bd2lcvg9lTyBH",
        "_score" : 1.0,
        "_source" : {
          "customerId" : [
            "e177c1f8-1fbd-4b2e-82b8-760536e42742"
          ],
          "customProperty" : {
            "messageFrom" : [
              "BOT"
            ]
          },
          "@version" : "1",
          "userId" : [
            "e177c1f8-1fbd-4b2e-82b8-760536e42742"
          ],
          "@timestamp" : "2019-10-16T06:44:13.154Z",
          "uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
          "accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
        }
      }

@Version and @Timestamp are the new two fields are getting added in documents

Can anyone explain why it is getting added is there any other way to prevent this? As you compare both documents the _type and _id also getting changed we need both _type and _id same as our documents in self hosted Elasticsearch

The fields @version and @timestamp are generated by logstash, if you don't want them you will need to use a mutate filter to remove.

mutate {
    remove_fields => ["@version","@timestamp"]
}

To keep the _type and _id of your original documents, you will need to change your input and add the option docinfo => true to get those fields into the @metadata field and use them in your output, the documentation has an example for this.

input {
    elasticsearch {
        ...
        docinfo => true
    }

output {
    elasticsearch {
        ...
        document_type => "%{[@metadata][_type]}"
        document_id => "%{[@metadata][_id]}"
    }
}

Note that if your Amazon Elasticsearch is version 6.X or higher, you can only have one document type per index, and version 7.X is typeless , also, logstash version 7.X does not have the document_type option anymore.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM