[英]How to use the elasticsearch.helpers.scan to obtain the score of the aggregation result
There is an index of Elasticsearch with only post_id
and created_at
. 有一个Elasticsearch索引,其中只有
post_id
和created_at
。
I'd like to group_by with post_id
as the key. 我想使用
post_id
作为键进行group_by。
If use search ()
as follows, you can get the score for each post_id
. 如果按如下方式使用
search ()
,则可以获取每个post_id
的分数。
res = elastic.search(index='play_post',
body={
"size": 0,
"query": {
"range": {
"created_at": {
"gte": start_date,
"lte": end_date
}
}
},
"aggs": {
"group_by_post_id": {
"terms": {
"field": "post_id"
}
}
}
},
request_timeout=300)
The result is as follows. 结果如下。
{u'hits': {u'hits': [], u'total': 2606639, u'max_score': 0.0}, u'_shards': {u'successful': 5, u'failed': 0, u'total': 5}, u'took': 318, u'aggregations': {u'group_by_post_id': {u'buckets': [{u'key': 29062, u'doc_count': 136}, {u'key': 2499828, u'doc_count': 122}, {u'key': 2422738, u'doc_count': 66}, {u'key': 174648, u'doc_count': 65}, {u'key': 1928122, u'doc_count': 65}, {u'key': 2012556, u'doc_count': 62}, {u'key': 377819, u'doc_count': 56}, {u'key': 2856270, u'doc_count': 55}, {u'key': 1417120, u'doc_count': 48}, {u'key': 238278, u'doc_count': 47}], u'sum_other_doc_count': 2605917, u'doc_count_error_upper_bound': 32}}, u'timed_out': False}
Now, because of the large number of data stored in Elasticsearch, I tried to use elasticsearch.helpers.scan
, trying to get the data as below. 现在,由于Elasticsearch中存储了大量数据,我尝试使用
elasticsearch.helpers.scan
,尝试获取如下数据。
res = elasticsearch.helpers.scan(elastic,
index='play_post',
scroll='2m',
query={
"size": 0,
"query": {
"range": {
"created_at": {
"gte": start_date,
"lte": end_date
}
},
},
"aggs": {
"group_by_post_id": {
"terms": {
"field": "post_id"
}
}
}
},
request_timeout=300)
However, the result has not been able to acquire the score of how many post_id
there are as follows. 但是,结果无法获得如下所示的
post_id
的分数。
{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1281625, u'created_at': u'2018-04-14T19:29:11', u'user_id': 377765}, u'_score': None, u'_index': u'play_post', u'_id': u'd45d181c-0d2f-4bc9-aaa8-46fa5c41b748'}
{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1632815, u'created_at': u'2018-04-15T13:09:56', u'user_id': 78467}, u'_score': None, u'_index': u'play_post', u'_id': u'cd279f13-42ee-4981-97c7-c18668a9b624'}
{u'sort': [0], u'_type': u'play_post', u'_source': {u'post_id': 1135965, u'created_at': u'2018-04-15T11:58:54', u'user_id': 318212}, u'_score': None, u'_index': u'play_post', u'_id': u'475f7199-4b20-4484-959a-873c38660180'}
.....
...
..
.
Please tell me how to do it. 请告诉我该怎么做。
Scroll
query hasn't score because it is expansive with a such amount of doc retrieved. Scroll
查询没有得分,因为它可以扩展到如此大量的文档。 Also in github page of ES project there is someone that was annoyed by this behaviour: https://github.com/olivere/elastic/issues/661 同样在ES项目的github页面中,有人对此行为感到恼火: https : //github.com/olivere/elastic/issues/661
To have the same result as a scroll query with a score, but a little more slower, you could use the search after
query - doc here . 要获得与带分数的滚动查询相同的结果,但速度稍慢一些,可以
search after
此处查询文档 search after
使用search after
。 You need only a date field, and another field that uniquely identify a doc - it's enough a _id
field or an _uid
field. 你只需要一个日期字段,并且另一场唯一标识一个文档-这是一个足以
_id
场或_uid
场。 Take a look on this answer: Elastic search not giving data with big number for page size 看一下这个答案: 弹性搜索没有给出大的页面大小数据
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.