[英]Elasticsearch: Which Nested mapping I should use to store data after aggregation result
Let's say I have a lot of elastic docs as following: 假设我有很多弹性文档,如下所示:
{
"_index": "f2016-07-17",
"_type": "trkvjadsreqpxl.gif",
"_id": "AVX2N3dl5siG6SyfyIjb",
"_score": 1,
"_source": {
"time": "1468714676424",
"meta": {
"cb_id": 25681,
"mt_id": 649,
"c_id": 1592,
"revenue": 2.5,
"mt_name": "GMS-INAPP-EN-2.5",
"c_description": "COULL-INAPP-EN-2.5",
"domain": "wv.inner-active.mobi",
"master_domain": "649###wv.inner-active.mobi",
"child_domain": "1592###wv.inner-active.mobi",
"combo_domain": "25681###wv.inner-active.mobi",
"ip": "52.42.87.73"
}
}....
}
My purpose is to make simple histogram aggregation with term aggs' ,and insert back the aggregated result into new index/structure. 我的目的是使用术语aggs进行简单的直方图聚合,然后将聚合结果重新插入新的索引/结构中。
The Aggregation is: 聚合为:
{
"aggs": {
"hour":{
"date_histogram": {
"field": "time",
"interval": "hour"
},
"aggs":{
"hour_m_tag":{
"terms":{
"field":"meta.mt_id"
}
}
}
}
}
}
The Result is as expected: 结果符合预期:
"aggregations": {
"hour": {
"buckets": [
{
"key_as_string": "2016-07-17T00:00:00.000Z",
"key": 1468713600000,
"doc_count": 94411750,
"hourly_m_tag": {
"doc_count_error_upper_bound": 1485,
"sum_other_doc_count": 30731646,
"buckets": [
{
"key": 10,
"doc_count": 10175501
},
{
"key": 649,
"doc_count": 200000
}....
]
}
},
{
"key_as_string": "2016-07-17T01:00:00.000Z",
"key": 1468717200000,
"doc_count": 68738743,
"hourly_m_tag": {
"doc_count_error_upper_bound": 2115,
"sum_other_doc_count": 22478590,
"buckets": [
{
"key": 559,
"doc_count": 8307018
},
{
"key": 649,
"doc_count" :100000
}...
I want to parse the result which is no problem , and store it back into new index, 我想解析没有问题的结果,并将其存储回新索引中,
What Nested mapping should I use on the new Index in order to fetch the aggregated data later. 我应该在新Index上使用什么嵌套映射,以便稍后获取聚合数据。
Expected data structure: 预期的数据结构:
{
"hour": [
{
"time": "00:00",
"child_tag": {
"300": 100,
"310": 200
},
"master_tag": {
"1000": 300,
"1001": 400
"1010": 400
}
},
{
"time": "01:00",
"child_tag": {
"300": 500,
"310": 600
},
"master_tag": {
"1000": 700,
"1010": 800
}
}
]...
}
PS 聚苯乙烯
The aggregation later should make sum on master_tag/child_tag keys: between hours. 稍后的聚合应在master_tag / child_tag密钥上求和:小时之间。
for instance: query between 00:00-01:00 例如:00:00-01:00之间的查询
{
"child_tag": {
"300": 600,//100+500
"310": 800 //200+600
},
"master_tag": {
"1000": 1000, //300+700
"1001": 400
"1010": 1200 //400+800
}
}
Thanks a lot! 非常感谢!
According to your comment and edits, I suggest storing one document per hour in your new index, so it'll be easier to query documents based on specific hours. 根据您的评论和修改,我建议每小时在新索引中存储一个文档,这样可以更轻松地根据特定时间查询文档。
The mapping I suggest is as follows: 我建议的映射如下:
PUT /agg_index
{
"mappings": {
"my_type": {
"properties": {
"time": {
"type": "date",
"format": "HH:mm"
},
"child_tag": {
"type": "nested"
},
"master_tag": {
"type": "nested"
}
}
}
}
}
Then you can index your new documents like this: 然后,您可以像这样对新文档建立索引:
PUT /agg_index/doc/1
{
"time": "00:00",
"child_tag": {
"300": 100,
"310": 200
},
"master_tag": {
"1000": 300,
"1001": 400,
"1010": 400
}
}
PUT /agg_index/doc/2
{
"time": "01:00",
"child_tag": {
"300": 500,
"310": 600
},
"master_tag": {
"1000": 700,
"1010": 800
}
}
And you'll be able to query documents and run aggregations on the nested child_tag
and master_tag
elements. 您将能够查询文档并在嵌套的
child_tag
和master_tag
元素上运行聚合。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.