简体   繁体   中英

Lucene performance

could you please suggest on the steps to be followed for lucene performance. especially with large data (around 1TB of pdf files to be indexed)

  1. Read Scaling Lucene and Solr .
  2. Define your needs from Lucene (for example: you are indexing PDFs - do you need to store the full text, just to make it searchable, or not at all?)
  3. Make a small-scale experiment - index a few documents, see whether retrieval is good enough.
  4. Try to index the whole thing (considering the paper's tips for quick indexing and for indexing for retrieval speed) - Is retrieval good enough? Is performance good enough?
  5. Iterate.

Please check the tips on the question Optimizing Lucene Performance . Since you are working with large amount of data, you also need to watch the index creation performance. Some tips on improving indexing performance and search performance are available on Lucene Wiki.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM