简体   繁体   English

搜索引擎 - Lucene或Solr

[英]Search Engine - Lucene or Solr

We need to integrate a search engine in our Product Catalog management software. 我们需要在我们的产品目录管理软件中集成搜索引擎。 the catalog is expected to have more than 4-5 mn. 该目录预计将超过4-5百万。 records with relational data spread over several tables. 关系数据的记录分布在几个表中。 Our dev platform is Asp.Net 3.5 and we have done some pre-liminary work on Lucene, found it to be good. 我们的开发平台是Asp.Net 3.5,我们已经对Lucene做了一些初步的工作,发现它很好。 However, we just came to know of Solr and was looking for some practical tips to compare Lucene & Solr from implementation, timeline, regular maintenance, performance, features perspective. 然而,我们刚刚了解了Solr并且正在寻找一些实用的技巧来比较Lucene和Solr的实现,时间表,定期维护,性能,功能。 Any guidance or pointers would be really helpful. 任何指导或指示都会非常有用。 Thanks. 谢谢。

Lucene: Lucene的:

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. Apache Lucene是一个完全用Java编写的高性能,功能齐全的文本搜索引擎库。 It is a technology suitable for nearly any application that requires full-text search 它是一种适用于几乎所有需要全文搜索的应用程序的技术

Solr: Solr的:

Solr is an open source enterprise search server based on the Lucene Java search library, with XML/HTTP and JSON APIs, hit highlighting, faceted search, caching, replication, a web administration interface and ... Solr是一个基于Lucene Java搜索库的开源企业搜索服务器,具有XML / HTTP和JSON API,突出显示,分面搜索,缓存,复制,Web管理界面......

Essentially, Lucene is embedded in Solr and is purely a full-text search library, with the purpose of being embedded into projects giving them full-text search capabilities. 从本质上讲,Lucene嵌入在Solr中,纯粹是一个全文搜索库,其目的是嵌入到项目中,为其提供全文搜索功能。 Solr has much more features and administration capabilities, allowing to search structured data without needing to write any custom code, load data from CSV files, tolerant parsing of user input, faceted searching, highlighting matched text in results, and retrieving search results in a variety of formats (XML, JSON, ...) . Solr具有更多功能和管理功能,允许搜索结构化数据,无需编写任何自定义代码,从CSV文件加载数据,容忍解析用户输入,分面搜索,突出显示结果中的匹配文本,以及检索各种搜索结果格式(XML,JSON,...)。 Check Solr features page and see if any feature is relevant for your project. 检查Solr功能页面 ,查看是否有任何功能与您的项目相关。

I have to agree with Andrew Clegg. 我不得不同意安德鲁克莱格。 I think when a lot of Java Developer types look at Lucene vs Solr, Lucene looks more friendly because it's a just a library (POJJ: Plain Old Java Jar!), like any other library and it looks straightforward to embed, versus the complexity of standing Solr up as a separate process that communicates over complex HTTP. 我认为当许多Java Developer类型看Lucene vs Solr时,Lucene看起来更友好,因为它只是一个库(POJJ:Plain Old Java Jar!),就像任何其他库一样,它看起来很容易嵌入,而不是复杂性将Solr作为一个独立的进程,通过复杂的HTTP进行通信。

However, I think that for almost all search use cases, Solr is the right approach. 但是,我认为对于几乎所有的搜索用例,Solr都是正确的方法。 Because most of the complexity in Search is not the direct initial integration, but in the fuzzy areas of tuning searches, scaling to meet demand, and maintaining your indexes that cross over from the developer centric world to being in the systems world. 因为搜索中的大多数复杂性不是直接的初始集成,而是在调整搜索的模糊区域,扩展以满足需求,并维护从开发人员中心世界跨越到系统世界的索引。 And Solr handles all of those needs nicely. Solr很好地处理了所有这些需求。

Like dcruz says, Solr uses Lucene anyway, so it's not a valid comparison. 就像dcruz说的那样,Solr无论如何都使用了Lucene,因此它不是一个有效的比较。

Lucene is a toolkit for building search apps, Solr is a search app built with Lucene. Lucene是用于构建搜索应用程序的工具包,Solr是一个使用Lucene构建的搜索应用程序。

IMO you'd be crazy not to use Solr, as it provides you with a lot of 'plumbing' that you'd have to write yourself otherwise -- like a configurable Data Import Handler to suck data out of your RDBMS or XML repositories. IMO你疯了不要使用Solr,因为它为你提供了许多你必须自己写的“管道” - 就像一个可配置的数据导入处理程序来从你的RDBMS或XML存储库中吸取数据。

Plus it gives you a web admin interface and other bells and whistles. 此外,它还为您提供了一个Web管理界面和其他铃声和口哨声。

One thing to consider is how difficult it will be to setup your application when you mix these two environments (Java/.NET). 需要考虑的一件事是在混合使用这两种环境(Java / .NET)时设置应用程序有多困难。 If you use the Lucene.NET libraries you can limit your required external dependency installs which streamlines deployment. 如果使用Lucene.NET库,则可以限制所需的外部依赖项安装,从而简化部署。

Another thing to consider is do you need the extras that Solr is offering? 另外需要考虑的是你需要 Solr提供的额外功能吗? A(nother) web admin interface is probably great but it extends your risk envelope. 一个(另一个)Web管理界面可能很棒,但它可以扩展您的风险范围。 Laying down Java and another service means more patch management. 放下Java和其他服务意味着更多的补丁管理。 If you stick with .NET only your patch strategy can be the standard windows update model. 如果您坚持使用.NET,那么您的补丁策略可以是标准的Windows更新模型。

Of course rolling your an implementation using Lucene.NET will have development and maintenance costs of its own but in my experience it has been straight forward and easy to work with. 当然使用Lucene.NET滚动你的实现将有自己的开发和维护成本,但根据我的经验,它是直接的,易于使用。

We are exactly in the same situation as you are. 我们和你一样处于同样的境地。 Unfortunately I was not directly involved in the evaluation process, but at the end we're going to use Solr integrated with Lucene. 不幸的是,我没有直接参与评估过程,但最后我们将使用与Lucene集成的Solr。

The main advantage is the variety of formats as dcruz described. 主要优点是dcruz描述的各种格式。 So you can query your Solr-Consumer and get back your search result as XML data which can be easily parsed and displayed on the webpage. 因此,您可以查询Solr-Consumer并将搜索结果作为XML数据返回,这些数据可以轻松解析并显示在网页上。

Let me shift your focus a bit: are you prepared to changes in architecture of you product? 让我转移你的注意力:你准备好改变产品的架构吗? Both Lucene and Solr are implemented in Java. Lucene和Solr都是用Java实现的。 So you will end up running yet another web-container for hosting it (and hence will lose platform purity so to say). 因此,您最终将运行另一个用于托管它的Web容器(因此可能会失去平台纯度)。 While Lucene was ported to .NET ( Lucene.NET project ), Solr was not as far as I know. 虽然Lucene被移植到.NET( Lucene.NET项目 ),但Solr并不是我所知道的。 If you happen to use SQL Server (which is likely, considering you platform), you might consider SQL Server Full-Text Search instead - it has almost the same features (not so feature-rich as Lucene/Solr, but anyway) and usually (in most cases) is much easier to incorporate into existing application. 如果您碰巧使用SQL Server(很可能,考虑到您的平台),您可能会考虑使用SQL Server全文搜索 - 它具有几乎相同的功能(不像Lucene / Solr那样功能丰富,但无论如何)通常(在大多数情况下)更容易融入现有应用程序。 Besides that you benefit from simplified maintenance (it comes together with you database) and staying within single platform as well. 除此之外,您还可以从简化的维护(它与您的数据库一起)中受益,并保持在单一平台内。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM