[英]How to efficiently get only one field of the documents for elastic search
I'm new to elasticsearch and I have some technical difficulties. 我是Elasticsearch的新手,并且遇到了一些技术困难。 Currently I have docs that are stored in hourly indexes and they are time series data.
目前,我有按小时索引存储的文档,它们是时间序列数据。 What I'm trying to figure out is how to efficiently extracting only the
key
field values, which is defined as "key": { "type": "long" }
. 我要弄清楚的是如何仅有效地提取
key
字段值,该值定义为"key": { "type": "long" }
。 I tried initially the naive method, which is scrolling through all indices and extract the field, but apparently that doesn't finish very quickly, each hourly index has about 10M docs and scrolling 3 indexes already takes forever. 我最初尝试的是朴素的方法,该方法可滚动浏览所有索引并提取字段,但显然并不能很快完成,每个小时的索引大约有1000万个文档,并且滚动3个索引已经花费了很长时间。
Then I came to terms aggregations , tried to make key
field as the aggregation term: 然后,我进入术语聚合 ,试图使
key
段成为聚合术语:
"aggregations": {
"test_group": {
"terms": {
"field": "key",
"size": 100000
}
}
}
That gives me better performance but still not sufficient as a real-time system as users try to search for the history, because key
is a high cardinality field. 这给了我更好的性能,但是当用户尝试搜索历史记录时,它还不足以作为实时系统,因为
key
是一个高基数字段。 Some rough benchmarks told me that: 一些粗略的基准测试告诉我:
size = 50k, indices = 4, time range = 3hrs: 7.1s
size = 100k, indices = 4, time range = 3hrs: 7.669s
size = 1m, indices = 4, time range = 2hrs: 12.669s
size = 1m, indices = 4, time range = 3hrs: 14.669s
This is not the end of it, because I'm using elastic search go library to parse the output and do some processing, which adds non-trivial time to the overall response. 这还没有结束,因为我正在使用弹性搜索go库来解析输出并进行一些处理,这给整个响应增加了不平凡的时间。
My question is: is this already the best ES can do? 我的问题是:这已经是最好的ES可以做到的吗? Is there any other ways that I'm missing?
还有其他我想念的方式吗? I'm currently on ES 5.6 and 3 nodes for the cluster, all using Amazon i3-4xl instances.
我目前在ES 5.6和3个群集节点上,全部使用Amazon i3-4xl实例。 Thanks.
谢谢。
If I understand your question correctly you are trying to retrieve a specific field from your document called 'key' and I assume you have other fields in your documents that are being returned that you don't care about? 如果我正确理解了您的问题,那么您正在尝试从文档中检索一个称为“键”的特定字段,并且我认为您正在返回的文档中还有其他字段是您不关心的?
If so, try this: 如果是这样,请尝试以下操作:
GET /_search
{
"_source": {
"includes": ["key"]
}
}
I am not exactly sure what you are trying to achieve but retrieve one field from your document usually required store parameter as true, so this fields doesn't need to be parsed from _source field. 我不确定您要实现的目标,但是从文档中检索一个字段通常需要将store参数设置为true,因此不需要从_source字段解析此字段。
Check doc: https://www.elastic.co/guide/en/elasticsearch/reference/current/number.html#number-params 查看文件: https : //www.elastic.co/guide/zh-CN/elasticsearch/reference/current/number.html#number-params
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.