简体   繁体   English

Elasticsearch滚动扫描查询不返回所有文档,缺少第一组

[英]Elasticsearch scroll scan query doesn't return all documents, missing first set

I'm trying to scroll my ES index and grab all the documents but it looks like I keep missing the first set of documents returned by the initial scroll. 我正在尝试滚动我的ES索引并获取所有文档,但看起来我一直缺少初始滚动返回的第一组文档。 For example if my scroll size is 10 and my query returns a total of 100 after scrolling I would only have 90 documents. 例如,如果我的滚动大小为10并且我的查询在滚动后返回总计100,那么我将只有90个文档。 Any suggestions on what I'm missing? 关于我缺少什么的任何建议?

Here's what I've currently tried: 这是我目前正在尝试的内容:

$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';

$params = [
    "scroll" => "1m",
    "size" => 50,
    "index" => "myindex",
    "type" => "mytype",
    "body" => $json 
];

$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];

print " total results:   " . $scroll_size;

//scroll
$count = 0;
while ($scroll_size > 0) {
    print "  SCROLLING...";
    $scroll_results = $client->scroll([
        'scroll_id' => $s_id,
        'scroll' => '1m'
    ]);

    // get number of results returned in the last scroll
    $scroll_size = sizeof($scroll_results['hits']['hits']);
    print "  scroll size: " . $scroll_size;

    // do something with results
    for ($i=0; $i<$scroll_size; $i++) {
        $count++;
    }
}
print " total id count: " . $id_count;

the first query you execute to return number of documents, also returns documents. 您执行的第一个查询返回文档数,也返回文档。 The first query is to establish the scroll and also to get the first set of documents. 第一个查询是建立滚动并获取第一组文档。 Once you process the first set of results, you can use the scroll_id to get the next page and so on. 处理完第一组结果后,可以使用scroll_id获取下一页,依此类推。

Thanks @Ramdev. 谢谢@Ramdev。 Yeah I realized that after a little digging. 是的,我意识到经过一番挖掘。 For anyone else Here's what ended up working for me: 对于其他任何人这里最终为我工作的是:

$json = '{"query":{"bool":{"must":[{"match_all":{}}]}}}';
$count = 0;
$params = [
    "scroll" => "1m",
    "size" => 50,
    "index" => "myindex",
    "type" => "mytype",
    "body" => $json 
];

$results = $client->search($params);
$scroll_size = $results['hits']['total']; // returns total docs that match query
$s_id = $results['_scroll_id'];

print " total results:   " . $scroll_size;

// first set of scroll results
for ($i=0; $i<$size; $i++) {
    $count++;
}
//scroll
while ($scroll_size > 0) {
    print "  SCROLLING...";
    $scroll_results = $client->scroll([
        'scroll_id' => $s_id,
        'scroll' => '1m'
    ]);

    // get number of results returned in the last scroll
    $scroll_size = sizeof($scroll_results['hits']['hits']);
    print "  scroll size: " . $scroll_size;

    // do something with results
    for ($i=0; $i<$scroll_size; $i++) {
        $count++;
    }
}
print " total id count: " . $id_count;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM