[英]ElasticSearch: Index only the fields specified in the mapping
I have an ElasticSearch setup, receiving data to index via a CouchDB river. 我有一个ElasticSearch设置,通过CouchDB河接收数据到索引。 I have the problem that most of the fields in the CouchDB documents are actually not relevant for search: they are fields internally used by the application (IDs and so on), and I do not want to get false positives because of these fields. 我有一个问题,CouchDB文档中的大多数字段实际上与搜索无关:它们是应用程序内部使用的字段(ID等),我不希望因为这些字段而得到误报。 Besides, indexing not needed data seems to me a waste of resources. 此外,索引不需要的数据在我看来是浪费资源。
To solve this problem, I have defined a mapping where I specify the fields which I want to be indexed. 为了解决这个问题,我已经定义了一个映射,我在其中指定了我想要编入索引的字段。 I am using pyes to access ElasticSearch. 我使用pyes访问ElasticSearch。 The process that I follow is: 我遵循的过程是:
This is the index definition as obtained by: 这是通过以下方式获得的索引定义:
curl -XGET http://localhost:9200/notes_index/_mapping?pretty=true
{
"notes_index" : {
"default_mapping" : {
"properties" : {
"note_text" : {
"type" : "string"
}
}
},
"couchdb" : {
"properties" : {
"_rev" : {
"type" : "string"
},
"created_at_date" : {
"format" : "dateOptionalTime",
"type" : "date"
},
"note_text" : {
"type" : "string"
},
"organization_id" : {
"type" : "long"
},
"user_id" : {
"type" : "long"
},
"created_at_time" : {
"type" : "long"
}
}
}
}
}
The problem that I have is manyfold: 我遇到的问题有很多:
Do you have any advice on this? 你对此有什么建议吗?
This is what I am actually doing, exactly as typed: 这就是我实际做的,与输入完全一样:
server="localhost"
# Create the index
curl -XPUT "$server:9200/index1"
# Create the mapping
curl -XPUT "$server:9200/index1/mapping1/_mapping" -d '
{
"type1" : {
"properties" : {
"note_text" : {"type" : "string", "store" : "no"}
}
}
}
'
# Configure the river
curl -XPUT "$server:9200/_river/river1/_meta" -d '{
"type" : "couchdb",
"couchdb" : {
"host" : "localhost",
"port" : 5984,
"user" : "admin",
"password" : "admin",
"db" : "notes"
},
"index" : {
"index" : "index1",
"type" : "type1"
}
}'
The documents in index1 still contain fields other than "note_text", which is the only one that I have specifically mentioned in the mapping definition. index1中的文档仍然包含“note_text”以外的字段,这是我在映射定义中特别提到的唯一字段。 Why is that? 这是为什么?
The default behavior of CouchDB river is to use a 'dynamic' mapping, ie index all the fields that are found in the incoming CouchDB documents. CouchDB河的默认行为是使用“动态”映射,即索引在传入的CouchDB文档中找到的所有字段。 You're right that it can unnecessarily increase the size of the index (your problems with search can be solved by excluding some fields from the query). 你是对的,它可以不必要地增加索引的大小(你可以通过从查询中排除一些字段来解决搜索问题)。
To use your own mapping instead of the 'dynamic' one, you need to configure the River plugin to use the mapping you've created (see this article ): 要使用您自己的映射而不是“动态”映射,您需要配置River插件以使用您创建的映射(请参阅此文章 ):
curl -XPUT 'elasticsearch-host:9200/_river/notes_index/_meta' -d '{
"type" : "couchdb",
... your CouchDB connection configuration ...
"index" : {
"index" : "notes_index",
"type" : "mapping1"
}
}'
The name of the type that you're specifying in URL while doing mapping PUT
overrides the one that you're including in the definition, so the type that you're creating is in fact mapping1
. 您在执行映射时在URL中指定的类型的名称PUT
会覆盖您在定义中包含的类型,因此您创建的类型实际上是mapping1
。 Try executing this command to see for yourself: 尝试执行此命令以查看自己:
> curl 'localhost:9200/index1/_mapping?pretty=true'
{
"index1" : {
"mapping1" : {
"properties" : {
"note_text" : {
"type" : "string"
}
}
}
}
}
I think that if you will get the name of type right, it will start working fine. 我认为,如果你得到类型的名称,它将开始正常工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.