简体   繁体   English

带有父子文档的弹性搜索问题

[英]problems on elasticsearch with parent child documents

We work with two types of documents on elastic search (ES): items and slots, where items are parents of slot documents. 我们使用弹性搜索(ES)上的两种类型的文档:项目和插槽,其中项目是插槽文档的父项。 We define the index with the following command: 我们使用以下命令定义索引:

curl -XPOST 'localhost:9200/items' -d @itemsdef.json

where itemsdef.json has the following definition 其中itemsdef.json具有以下定义

{
"mappings" : {
    "item" : {
        "properties" : {
            "id" : {"type" : "long" },
            "name" : {
                "type" : "string",
                "_analyzer" : "textIndexAnalyzer"   
            },
            "location" : {"type" : "geo_point" },
        }
    }
},
"settings" : {
    "analysis" : {
        "analyzer" : {

                "activityIndexAnalyzer" : {
                    "alias" : ["activityQueryAnalyzer"],
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textIndexAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["word_delimiter_impl", "trim", "lowercase", "asciifolding", "spanish_stop", "spanish_synonym"]
                },
                "textQueryAnalyzer" : {
                    "type" : "custom",
                    "tokenizer" : "whitespace",
                    "filter" : ["trim", "lowercase", "asciifolding", "spanish_stop"]
                }       
        },
        "filter" : {        
                "spanish_stop" : {
                    "type" : "stop",
                    "ignore_case" : true,
                    "enable_position_increments" : true,
                    "stopwords_path" : "analysis/spanish-stopwords.txt"
                },
                "spanish_synonym" : {
                    "type" : "synonym",
                    "synonyms_path" : "analysis/spanish-synonyms.txt"
                },
                "word_delimiter_impl" : {
                    "type" : "word_delimiter",
                    "generate_word_parts" : true,
                    "generate_number_parts" : true,
                    "catenate_words" : true,
                    "catenate_numbers" : true,
                    "split_on_case_change" : false                  
                }               
        }
    }
}
}

Then we add the child document definition using the following command: 然后我们使用以下命令添加子文档定义:

curl -XPOST 'localhost:9200/items/slot/_mapping' -d @slotsdef.json

Where slotsdef.json has the following definition: 其中slotsdef.json具有以下定义:

{
"slot" : {
    "_parent" : {"type" : "item"},
    "_routing" : {
        "required" : true,
        "path" : "parent_id"
    },
    "properties": {
        "id" : { "type" : "long" },
        "parent_id" : { "type" : "long" },
        "activity" : {
            "type" : "string",
            "_analyzer" : "activityIndexAnalyzer"
        },
        "day" : { "type" : "integer" },
        "start" : { "type" : "integer" },
        "end" :  { "type" : "integer" }
    }
}   
}

Finally we perform a bulk index with the following command: 最后,我们使用以下命令执行批量索引:

curl -XPOST 'localhost:9200/items/_bulk' --data-binary @testbulk.json

Where testbulk.json holds the following data: testbulk.json保存以下数据:

{"index":{"_type": "item", "_id":35}}
{"location":[40.4,-3.6],"id":35,"name":"A Name"}
{"index":{"_type":"slot","_id":126,"_parent":35}}
{"id":126,"start":1330,"day":1,"end":1730,"activity":"An Activity","parent_id":35}

We see through ES Head plugin that definitions seem to be ok. 我们通过ES Head插件看到定义似乎没问题。 We test the analyzers to check that they have been loaded and they work. 我们测试分析仪以检查它们是否已加载并且它们有效。 Both documents appear listed in ES Head browser view. 这两个文档都显示在ES Head浏览器视图中。 But if we try to retrieve the child item using the API, ES responds that it does not exist: 但是如果我们尝试使用API​​检索子项,ES会回复它不存在:

$ curl -XGET 'localhost:9200/items/slot/126'
{"_index":"items","_type":"slot","_id":"126","exists":false}

When we import 50 documents, all parent documents can be retrieved through API, but only SOME of the requests for child elements get a successful response. 当我们导入50个文档时,可以通过API检索所有父文档,但只有一些子元素请求才能获得成功的响应。

My guess is that it may have something to do with how docs are stored across shards and the routing...which certainly is not clear to me how it works. 我的猜测是它可能与如何在分片和路由中存储文档有关...我当然不清楚它是如何工作的。

Any clue on how to be able to retrieve individual child documents? 有关如何检索单个子文档的任何线索? ES Head shows they have been stored but HTTP GETs to localhost:9200/items/slot/XXX respond randomly with "exists":false. ES Head显示它们已存储但HTTP GET到localhost:9200 / items / slot / XXX随机响应“exists”:false。

The child documents are using parent's id for routing. 子文档使用父标识进行路由。 So, in order to retrieve child documents you need to specify parent id in the routing parameter on your query: 因此,为了检索子文档,您需要在查询的routing参数中指定父ID:

curl "localhost:9200/items/slot/126?routing=35"

If parent id is not available, you will have to search for the child documents: 如果父ID不可用,则必须搜索子文档:

curl "localhost:9200/items/slot/_search?q=id:126"

or switch to an index with a single shard. 或切换到具有单个分片的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM