简体   繁体   English

使用Apache Lucene进行搜索优化

[英]Search optimization using Apache Lucene

I am working on a project to implement large scale indexing on twitter data for search optimization using Apache Lucene. 我正在一个项目上,该项目将对Twitter数据进行大规模索引以使用Apache Lucene进行搜索优化。 Lucene provides inverted index to filter out the blocks which match the specified selection criteria. Lucene提供倒排索引,以过滤出符合指定选择标准的块。

To implement this project, how should I go about it - Should I install the Cloudera vm and proceed? 要实施此项目,我应该如何进行-应该安装Cloudera vm并继续进行吗? or Should I deploy Hadoop from Apache on Ubuntu platform? 还是应该在Ubuntu平台上从Apache部署Hadoop?

The reason I am asking this is because I am not able to confirm if Cloudera already uses Lucene to optimize search. 我之所以这样问,是因为我无法确认Cloudera是否已使用Lucene优化搜索。

Please advise. 请指教。

Cloudera gives you debian packages and software for automatic installation and cluster management. Cloudera为您提供了用于自动安装和集群管理的debian软件包和软件。 That's it. 而已。 There is nothing about search (and it's optimization) in Hadoop stack. Hadoop堆栈中与搜索(及其优化)无关。 So you may choose either vanilla or cloudera for you project. 因此,您可以为项目选择Vanilla或cloudera。

For search you can use Elasticsearch. 对于搜索,您可以使用Elasticsearch。 It have integration with hadoop and uses Lucene internally 与hadoop集成并在内部使用Lucene

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM