简体   繁体   中英

PHP Ruflin/Elastica - how to refresh index on huge data insert

I need to insert about 1.5 million documents to Elasticsearch databse. I do it via PHP library Elastica according this example (BULK example)

I would like to know if it is posible to use call $elasticaType->getIndex()->refresh(); command at the very end of bulks insertion and if it is safe and faster than call $elasticaType->getIndex()->refresh(); after every bulk sending. I mean something like this:

$offset = 0;
$limit = 500;
$sum = 1500000,

while( $offset < $sum )
{        
    $documents = [];
    $rows = $sqlDatabase->getData( $offset, $limit )

    foreach( $rows as $row )
    {
        $docData = ['name' => $row->name, 'email' => $row->email]
        $documents[] = new \Elastica\Document( $data->id, $docData );
    }

    $elasticaType->addDocuments( $documents );
    $offset += 500;
    // Source example has refresh here. After every 500 items. But I wont it at the very end of the code after all 1500000 item are in the database.
    // $elasticaType->getIndex()->refresh();
}

$elasticaType->getIndex()->refresh();  // This is what I want.

Is it possible to insert 1500000 documents to elasticsearch and then call $elasticaType->getIndex()->refresh(); ?

Is it possible to insert 1500000 documents to elasticsearch and then call $elasticaType->getIndex()->refresh();?

Definitely yes.

A refresh makes your document available for search, This mechanism is derived from Apache Lucene to provide near real-time (NRT) search capabilities, It uses DirectoryReader.openIfChanged to reopen index.

Usually you don't have to do it yourself, a refresh is scheduled periodically by default, You can change the value of refresh_interval to shorter time for NRT search, or longer for performance.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM