[英]Logstash: Migrating from one elastic search to another elastic search result in some additional properties
I have been migrating one of the indexes from self-hosted Elasticsearch to Amazon ElasticSearch using Logstash.我一直在使用 Logstash 将其中一个索引从自托管 Elasticsearch 迁移到 Amazon ElasticSearch。 After successful migration what we found was some additional fields is getting added in the documents.
成功迁移后,我们发现文档中添加了一些额外的字段。 How can we prevent it from getting added
我们如何防止它被添加
Our Logstash config file我们的 Logstash 配置文件
input {
elasticsearch {
hosts => ["https://staing-example.com:443"]
user => "userName"
password => "password"
index => "testingindex"
size => 100
scroll => "1m"
}
}
filter {
}
output {
amazon_es {
hosts => ["https://example.us-east-1.es.amazonaws.com:443"]
region => "us-east-1"
aws_access_key_id => "access_key_id"
aws_secret_access_key => "access_key_id"
index => "testingindex"
}
stdout{
codec => rubydebug
}
}
The document in our selfhosted ElasticSearch我们自托管 ElasticSearch 中的文档
{
"_index": "testingindex",
"_type": "interaction-3",
"_id": "38b23e7a-eafd-4163-a9f0-e2d9ffd5d2cf",
"_score": 1,
"_source": {
"customerId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"customProperty" : {
"messageFrom" : [
"BOT"
]
},
"userId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
"accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
}
}
The document that is in our Amazon ElasticSearch我们的 Amazon ElasticSearch 中的文档
{
"_index" : "testingindex",
"_type" : "doc",
"_id" : "B-hP020Bd2lcvg9lTyBH",
"_score" : 1.0,
"_source" : {
"customerId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"customProperty" : {
"messageFrom" : [
"BOT"
]
},
"@version" : "1",
"userId" : [
"e177c1f8-1fbd-4b2e-82b8-760536e42742"
],
"@timestamp" : "2019-10-16T06:44:13.154Z",
"uniqueIdentifier" : "2b027fc0-a517-49a7-a71f-8732044cb249",
"accountId" : "724bee3e-38f8-4538-b944-f3e21c518437"
}
}
@Version and @Timestamp are the new two fields are getting added in documents @Version 和 @Timestamp 是文档中新添加的两个字段
Can anyone explain why it is getting added is there any other way to prevent this?谁能解释为什么要添加它是否有其他方法可以防止这种情况发生? As you compare both documents the
_type
and _id
also getting changed we need both _type
and _id
same as our documents in self hosted Elasticsearch当您比较两个文档时,
_type
和_id
也发生了变化,我们需要_type
和_id
与我们在自托管 Elasticsearch 中的文档相同
The fields @version
and @timestamp
are generated by logstash, if you don't want them you will need to use a mutate filter to remove. @version
和@timestamp
字段由 logstash 生成,如果您不想要它们,则需要使用 mutate 过滤器来删除。
mutate {
remove_fields => ["@version","@timestamp"]
}
To keep the _type
and _id
of your original documents, you will need to change your input and add the option docinfo => true
to get those fields into the @metadata
field and use them in your output, the documentation has an example for this.要保留原始文档的
_type
和_id
,您需要更改输入并添加选项docinfo => true
以将这些字段放入@metadata
字段并在 output 中使用它们, 文档中有一个示例。
input {
elasticsearch {
...
docinfo => true
}
output {
elasticsearch {
...
document_type => "%{[@metadata][_type]}"
document_id => "%{[@metadata][_id]}"
}
}
Note that if your Amazon Elasticsearch is version 6.X or higher, you can only have one document type per index, and version 7.X is typeless , also, logstash version 7.X does not have the document_type
option anymore.请注意,如果您的 Amazon Elasticsearch 版本为 6.X 或更高版本,则每个索引只能有一种文档类型,并且版本 7.X 是无类型的,此外, logstash版本 7.X 不再具有
document_type
选项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.