如何有效地仅获取文档的一个字段以进行弹性搜索

Question

I'm new to elasticsearch and I have some technical difficulties. 我是Elasticsearch的新手，并且遇到了一些技术困难。 Currently I have docs that are stored in hourly indexes and they are time series data. 目前，我有按小时索引存储的文档，它们是时间序列数据。 What I'm trying to figure out is how to efficiently extracting only the key field values, which is defined as "key": { "type": "long" } . 我要弄清楚的是如何仅有效地提取key字段值，该值定义为"key": { "type": "long" } 。 I tried initially the naive method, which is scrolling through all indices and extract the field, but apparently that doesn't finish very quickly, each hourly index has about 10M docs and scrolling 3 indexes already takes forever. 我最初尝试的是朴素的方法，该方法可滚动浏览所有索引并提取字段，但显然并不能很快完成，每个小时的索引大约有1000万个文档，并且滚动3个索引已经花费了很长时间。

Then I came to terms aggregations , tried to make key field as the aggregation term: 然后，我进入术语聚合，试图使key段成为聚合术语：

  "aggregations": {
    "test_group": {
      "terms": {
        "field": "key",
        "size": 100000
      }
    }
  }

That gives me better performance but still not sufficient as a real-time system as users try to search for the history, because key is a high cardinality field. 这给了我更好的性能，但是当用户尝试搜索历史记录时，它还不足以作为实时系统，因为key是一个高基数字段。 Some rough benchmarks told me that: 一些粗略的基准测试告诉我：

size = 50k,  indices = 4, time range = 3hrs: 7.1s
size = 100k, indices = 4, time range = 3hrs: 7.669s
size = 1m,   indices = 4, time range = 2hrs: 12.669s
size = 1m,   indices = 4, time range = 3hrs: 14.669s

This is not the end of it, because I'm using elastic search go library to parse the output and do some processing, which adds non-trivial time to the overall response. 这还没有结束，因为我正在使用弹性搜索go库来解析输出并进行一些处理，这给整个响应增加了不平凡的时间。

My question is: is this already the best ES can do? 我的问题是：这已经是最好的ES可以做到的吗？ Is there any other ways that I'm missing? 还有其他我想念的方式吗？ I'm currently on ES 5.6 and 3 nodes for the cluster, all using Amazon i3-4xl instances. 我目前在ES 5.6和3个群集节点上，全部使用Amazon i3-4xl实例。 Thanks. 谢谢。

Answer 1

If I understand your question correctly you are trying to retrieve a specific field from your document called 'key' and I assume you have other fields in your documents that are being returned that you don't care about? 如果我正确理解了您的问题，那么您正在尝试从文档中检索一个称为“键”的特定字段，并且我认为您正在返回的文档中还有其他字段是您不关心的？

If so, try this: 如果是这样，请尝试以下操作：

GET /_search
{
    "_source": {
        "includes": ["key"]
    }
}

Answer 2

I am not exactly sure what you are trying to achieve but retrieve one field from your document usually required store parameter as true, so this fields doesn't need to be parsed from _source field. 我不确定您要实现的目标，但是从文档中检索一个字段通常需要将store参数设置为true，因此不需要从_source字段解析此字段。

Check doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#number-params 查看文件： https : //www.elastic.co/guide/zh-CN/elasticsearch/reference/current/number.html#number-params

如何有效地仅获取文档的一个字段以进行弹性搜索

问题描述

2 个解决方案

解决方案1
0 2017-10-16 20:23:05

解决方案2
0 2017-10-17 12:57:27

如何有效地仅获取文档的一个字段以进行弹性搜索

问题描述

2 个解决方案

解决方案1 0 2017-10-16 20:23:05

解决方案2 0 2017-10-17 12:57:27

解决方案1
0 2017-10-16 20:23:05

解决方案2
0 2017-10-17 12:57:27