简体   繁体   中英

Elasticsearch scroll api search “from”

I have a script that generates sitemaps based on url index http://example.com/sitemap.index.xml where index is a number >0 that defines what results should be included in each chunk.

$chunk = 10000;
$counter = 0;

$scroll = $es->search(array(
    "index" => "index",
    "type" => "type",
    "scroll" => "1m",
    "search_type" => "scan",
    "size" => 10,
    "from" => $chunk * ($index - 1)
));
$sid = $scroll['_scroll_id'];

while($counter < $chunk){
    $docs = $es->scroll(array(
        "scroll_id" => $sid,
        "scroll" => "1m"
    ));
    $sid = $docs['_scroll_id'];
    $counter += count($docs['hits']['hits']);
}

// ...

Now each time I access http://example.com/sitemap.1.xml or http://example.com/sitemap.2.xml the results returned from ES are exactly the same. It returns 50 results (10 per each shard) but does not seem to take count of from = 0 , from = 10000 .

I'm using elasticsearch-php as ES library.

Any ideas?

In Java, it can be done as follows

QueryBuilder query = QueryBuilders.matchAllQuery();
SearchResponse scrollResp = Constants.client.prepareSearch(index)
        .setTypes(type).setSearchType(SearchType.SCAN)
        .setScroll(new TimeValue(600000)).setQuery(query)
        .setSize(500).execute().actionGet();
while (true) {
    scrollResp = Constants.client
            .prepareSearchScroll(scrollResp.getScrollId())
            .setScroll(new TimeValue(600000)).execute().actionGet();
    System.out.println("Record count :"
            + scrollResp.getHits().getHits().length);
    total = total + scrollResp.getHits().getHits().length;
    System.out.println("Total record count: " + total);
    for (SearchHit hit : scrollResp.getHits()) {
    //handle the hit
    }
    // Break condition: No hits are returned
    if (scrollResp.getHits().getHits().length == 0) {
        System.out.println("All records are fetched");
        break;
    }
}

Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM