简体繁体 English

更喜欢Apache Lucene而不是Solr的情况？

[英]Situations to prefer Apache Lucene over Solr?

原文 2010-05-18 10:43:16 2 5 java/ search/ lucene/ solr/ solrj

There are several advantages to use Solr 1.4 (out-of-the-box facetting search, grouping, replication, http administration vs. luke, ...). 使用Solr 1.4（开箱即用的分面搜索，分组，复制，http管理与luke，......）有几个优点。

Even if I embed a search-functionality in my Java application I could use SolrJ to avoid the HTTP trade-off when using Solr. 即使我在我的Java应用程序中嵌入了搜索功能，我也可以使用SolrJ来避免在使用Solr时进行HTTP权衡。 Is SolrJ recommended at all? 是SolrJ推荐的吗？

So, when would you recommend to use "pure-Lucene"? 那么，你什么时候推荐使用“纯Lucene”？ Does it have a better performance or requires less RAM? 它有更好的性能还是需要更少的RAM？ Is it better unit-testable? 是否可以更好地进行单元测试？

PS: I am aware of this question . PS：我知道这个问题。

5 个解决方案

If you have a web application, use Solr - I've tried integrating both, and Solr is easier. 如果您有一个Web应用程序，请使用Solr - 我尝试集成两者，并且Solr更容易。 Otherwise, if you don't need Solr's features (the one that comes to mind as being most important is faceted search), then use Lucene. 否则，如果您不需要Solr的功能（想到最重要的功能是分面搜索），那么请使用Lucene。

If you want to completely embed your search functionality within your application and do not want to maintain a separate process like Solr, using Lucene is probably preferable. 如果您想在搜索应用程序中完全嵌入搜索功能，并且不想维护像Solr这样的单独进程，那么使用Lucene可能更可取。 Per example, a desktop application might need some search functionality (like the Eclipse IDE that uses Lucene for searching its documentation). 例如，桌面应用程序可能需要一些搜索功能（例如使用Lucene搜索其文档的Eclipse IDE）。 You probably don't want this kind of application to launch a heavy process like Solr. 您可能不希望这种应用程序启动像Solr这样繁重的过程。

Here is one situation where I have to use Lucene. 这是我必须使用Lucene的一种情况。

Given a set of documents, find out the most common terms in them. 给出一组文档，找出其中最常见的术语。

Here, I need to access term vectors of each document (using low-level APIs of TermVectorMapper). 在这里，我需要访问每个文档的术语向量（使用TermVectorMapper的低级API）。 With Lucene it's quite easy. 使用Lucene非常容易。

Another use case is for very specialized ordering of search results. 另一个用例是搜索结果的非常专业的排序。 For exmaple, I want a search for an author name (who has writen multiple books) to result into one book from each store in the first 10 results. 例如，我想要搜索一个作者姓名（谁写了多本书），从前10个结果中的每个商店得到一本书。 In this case, I will find results from each book store and to show final results I will pick one result from each book store. 在这种情况下，我会找到每家书店的结果并显示最终结果，我会从每家书店中选择一个结果。 Here you are essentially doing multiple searches to generate final results. 在这里，您实际上是在进行多次搜索以生成最终结果。 Having access to low-level APIs of lucene definitely helps. 访问lucene的低级API肯定有帮助。

One more reason to go for Lucene was to get new goodies ASAP. 去Lucene的另一个原因是尽快获得新的好东西。 This no longer is true as both of them have been merged and there will be synchronous releases. 这不再是真的，因为它们已经合并并且将有同步版本。

我很惊讶没有人提到NRT - 近实时搜索，可用Lucene，但不是Solr（还）。

如果您更关注可伸缩性而不是性能，请使用Solr;如果您更关注性能而不是可伸缩性，请使用Lucene。