简体   繁体   English

使用Lucene的缺点是什么?

[英]What are the downsides of using Lucene?

I'm thinking about using Lucene in my project to do very fast searches. 我正在考虑在我的项目中使用Lucene来进行非常快速的搜索。 I know that Lucene creates its own files where it keeps all the data/indexes. 我知道Lucene创建了自己的文件,它保存所有的数据/索引。

I wonder what are the downsides of using Lucene? 我想知道使用Lucene的缺点是什么? Are there any? 有吗?

Do you have to do anything with the file database or does it work great without any outside help? 您是否必须对文件数据库执行任何操作,或者在没有任何外部帮助的情况下工作得很好?

PS I know there is also Lucene .NET and I bet the same rules apply there. PS我知道还有Lucene .NET,我打赌同样的规则适用于那里。

Lucene is great. Lucene很棒。 Very flexible, surprisingly fast, and a solid API. 非常灵活,令人惊讶的快速,以及可靠的API。 The mailing list is extremely helpful. 邮件列表非常有用。

The files do need a bit of maintenance, but it can be done with provided tools. 这些文件确实需要一些维护,但可以使用提供的工具完成。 Of primary importance is optimizing the index on occasion, but this is only needed if you update the index regularly. 最重要的是偶尔优化索引,但只有在定期更新索引时才需要这样做。

I would suggest looking into Solr as well. 我建议也要考虑Solr。 It's essentially a webapp and tools that sit on top of Lucene. 它本质上是一个位于Lucene之上的webapp和工具。 It makes it a tad easier to create new indexes, keep them optimized, as well as providing master/slave synchronization for a scalable search cluster. 它使创建新索引,保持优化以及为可伸缩搜索集群提供主/从同步变得更加容易。 This, of course, depends on your actual needs. 当然,这取决于您的实际需求。

For a personal example, I used to maintain a search index for a large, well-known, gaming company. 举个例子,我曾经为一家知名的大型游戏公司维护一个搜索索引。 The index had hundreds of thousands of entries in multiple languages (world-wide) and locales. 该索引拥有数十万种多语言(全球)和语言环境的条目。 It performed a million searches each day on the cluster without using hardly any CPU, and a reasonable amount of memory. 它每天在集群上执行一百万次搜索,几乎不使用任何CPU和合理的内存量。 It had load tested out to around 300 million searches per day, on the hardware we had and would scale linearly by simply adding more boxes to the cluser. 它已经在我们拥有的硬件上进行了大约3亿次搜索的负载测试,并且可以通过简单地向cluser添加更多的盒子来线性扩展。 Solr and Lucene were the primary tools for this. Solr和Lucene是这方面的主要工具。

If I had to give a downside, it would be learning curve. 如果我不得不给出一个缺点,那就是学习曲线。 There is quite a bit to understand, and if you want a truly optimized solution, you need to know it well. 有一点需要理解,如果你想要一个真正优化的解决方案,你需要很好地了解它。 However, this will happen with any search tool you use, if you do it yourself. 但是,如果您自己执行此操作,则会使用您使用的任何搜索工具。 The documentation, wikis, and mailing list provide plenty of support for this ramp up. 文档,维基和邮件列表为此提升提供了大量支持。

I have limited experience with Lucene, so far it has been great though. 我对Lucene的经验有限,到目前为止它一直很棒。 The downsides I can see are mainly from a business perspective: 我能看到的缺点主要来自业务方面:

  1. I have to actively make the case for using Lucene to my boss, by default we would use SQL Server. 我必须积极地将Lucene用于我的老板,默认我们会使用SQL Server。 To make the switch I will have to prove without a doubt that Lucene performs better (and not just similar) for the use case we have. 为了进行切换,我必须毫无疑问地证明Lucene对于我们的用例表现更好(而不仅仅是类似)。 I guess this one goes to the "Nobody ever got fired for buying IBM equipment" syndrome. 我想这个问题是“没有人因购买IBM设备而被解雇”。
  2. Ongoing development/bug fixes for Lucene.Net in particular are questionable at this point, again a tougher sell w/o this. 特别是Lucene.Net正在进行的开发/错误修复在这一点上是值得怀疑的,再次更难以出售。 I hope the community can rally. 我希望社区能够团结起来。

Lucene does great work for many people and companies . Lucene为很多人和公司做了很多工作。 Your mileage may vary, though. 不过,您的里程可能有所不同。 A possible problem is Lucene's scoring model - It uses a combination of TF/IDF and boolean scoring, while other IR tools use the probabilistic BM25, which is stronger. 一个可能的问题是Lucene的评分模型 - 它使用TF / IDF和布尔评分的组合,而其他IR工具使用概率更强的BM25。 However, You may work with Lucene for years and the search results would be good enough. 但是,您可能会与Lucene合作多年,搜索结果也足够好。 Also, scaling to many millions of documents is not easy. 此外,扩展到数百万个文档并不容易。

It boils down to your specific use-case. 它归结为您的具体用例。 It is best to start a test using Solr and see whether is seems to fit your needs. 最好使用Solr开始测试,看看是否符合您的需求。

Lucene do have scalability issue . Lucene确实存在可扩展性问题。 Its performance degrades when the index is getting larger and larger. 当索引越来越大时,其性能会下降。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM