简体   繁体   中英

What are the downsides of using Lucene?

I'm thinking about using Lucene in my project to do very fast searches. I know that Lucene creates its own files where it keeps all the data/indexes.

I wonder what are the downsides of using Lucene? Are there any?

Do you have to do anything with the file database or does it work great without any outside help?

PS I know there is also Lucene .NET and I bet the same rules apply there.

Lucene is great. Very flexible, surprisingly fast, and a solid API. The mailing list is extremely helpful.

The files do need a bit of maintenance, but it can be done with provided tools. Of primary importance is optimizing the index on occasion, but this is only needed if you update the index regularly.

I would suggest looking into Solr as well. It's essentially a webapp and tools that sit on top of Lucene. It makes it a tad easier to create new indexes, keep them optimized, as well as providing master/slave synchronization for a scalable search cluster. This, of course, depends on your actual needs.

For a personal example, I used to maintain a search index for a large, well-known, gaming company. The index had hundreds of thousands of entries in multiple languages (world-wide) and locales. It performed a million searches each day on the cluster without using hardly any CPU, and a reasonable amount of memory. It had load tested out to around 300 million searches per day, on the hardware we had and would scale linearly by simply adding more boxes to the cluser. Solr and Lucene were the primary tools for this.

If I had to give a downside, it would be learning curve. There is quite a bit to understand, and if you want a truly optimized solution, you need to know it well. However, this will happen with any search tool you use, if you do it yourself. The documentation, wikis, and mailing list provide plenty of support for this ramp up.

I have limited experience with Lucene, so far it has been great though. The downsides I can see are mainly from a business perspective:

  1. I have to actively make the case for using Lucene to my boss, by default we would use SQL Server. To make the switch I will have to prove without a doubt that Lucene performs better (and not just similar) for the use case we have. I guess this one goes to the "Nobody ever got fired for buying IBM equipment" syndrome.
  2. Ongoing development/bug fixes for Lucene.Net in particular are questionable at this point, again a tougher sell w/o this. I hope the community can rally.

Lucene does great work for many people and companies . Your mileage may vary, though. A possible problem is Lucene's scoring model - It uses a combination of TF/IDF and boolean scoring, while other IR tools use the probabilistic BM25, which is stronger. However, You may work with Lucene for years and the search results would be good enough. Also, scaling to many millions of documents is not easy.

It boils down to your specific use-case. It is best to start a test using Solr and see whether is seems to fit your needs.

Lucene do have scalability issue . Its performance degrades when the index is getting larger and larger.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM