简体   繁体   中英

Elasticsearch: Parent-child relationship after rollover

Suppose there is a simple blog index which contains two types: blog and comment. One blog can have multiple comments. The index is created like this

curl -X PUT \
  'http://localhost:9200/%3Cblog-%7Bnow%2Fd%7D-000001%3E?pretty=' \
  -H 'content-type: application/json' \
  -d '{
    "mappings": {
        "comment": {
            "_parent": { "type": "blog" },
            "properties": { 
                "name": { "type": "keyword" },
                "comment": { "type": "text" }
            }
        },
        "blog": {
            "properties": {
                "author": { "type": "keyword" },
                "subject": { "type": "text" },
                "content": { "type": "text" }
            }
        }
    }
}'

The index %3Cblog-%7Bnow%2Fd%7D-000001%3E is equal to <blog-{now/d}-000001> (see here for more about date math). We're going to add 'blog-active' alias to this index. This alias is going to be used for storing data.

curl -X POST 'http://localhost:9200/_aliases?pretty=' \
  -H 'content-type: application/json' \
  -d '{ "actions" : [ { "add" : { "index" : "blog-*", "alias" : "blog-active" } } ] }'

Now if we do the following actions:

1.Add a blog using blog-active alias

curl -X POST http://localhost:9200/blog-active/blog/1 \
  -H 'content-type: application/json' \
  -d '{
      "author": "author1",
      "subject": "subject1",
      "content": "content1"
  }'

2.Add a comment to the blog

curl -X POST \
  'http://localhost:9200/blog-active/comment/1?parent=1' \
  -H 'content-type: application/json' \
  -d '{
  "name": "commenter1",
  "comment": "new comment1"
}'

3.Do a rollover with max_docs = 2

curl -X POST \
  http://localhost:9200/blog-active/_rollover \
  -H 'content-type: application/json' \
  -d '{
  "conditions": {
    "max_docs": 2
  },
  "mappings": {
    "comment": {
      "_parent": { "type": "blog" },
      "properties": {
        "name": { "type": "keyword" },
        "comment": { "type": "text" }
      }
    },
    "blog": {
      "properties": {
        "author": { "type": "keyword" },
        "subject": { "type": "text" },
        "content": { "type": "text" }
      }
    }
  }
}'

4.And add another comment to the blog

curl -X POST \
  'http://localhost:9200/blog-active/comment/1?parent=1' \
  -H 'content-type: application/json' \
  -d '{
  "name": "commenter2",
  "comment": "new comment2"
}'

Now if we search all blog indices for all comments on 'author1' blogs with ( blog-%2A is blog-* )

curl -X POST \
  http://localhost:9200/blog-%2A/comment/_search \
  -H 'content-type: application/json' \
  -d '{
  "query": {
      "has_parent" : {
        "query" : {
          "match" : { "author" : { "query" : "author1" } }
        },
        "parent_type" : "blog"
      }
  }
}'

the result only contains first comment.

This is due to the fact that second comment is in the second index which does not have parent blog document in itself. So it doesn't know about the author of the blog.

博客索引

So, my question is how do I approach parent-child relations when rollover is used?

Is the relationship even possible in that case?

Similar question: ElasticSearch parent/child on different indexes

All documents that form part of a parent-child relationship need to live in the same index, more preciously same shard. Therefore it's not possible to have parent-child relationship if rollover is used, since it creates new indices.

One solution for the problem above could be to denormalize data by adding filed blog_author and blog_id in comment type. The mapping in that case will look like this (notice that parent-child relationship has been removed):

"mappings": {
  "comment": {
    "properties": {
      "blog_id": { "type": "keyword" },
      "blog_author": { "type": "keyword" },
      "name": { "type": "keyword" },
      "comment": { "type": "text" }
    }
  },
  "blog": {
    "properties": {
      "author": { "type": "keyword" },
      "subject": { "type": "text" },
      "content": { "type": "text" }
    }
  }
}

and the query to fetch comments by blog author is:

curl -X POST \
  http://localhost:9200/blog-%2A/comment/_search \
  -H 'cache-control: no-cache' \
  -H 'content-type: application/json' \
  -d '{
  "query": {
    "match": {
        "blog_author": "user1"
    }
  }
}'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM