简体   繁体   中英

Using Solr/Lucene as persistence technology

Solr/Lucene's reverse index and query supports an subset of RDBMS functionalities, ie filtering, sorting, groupby, paging. In this sense it is very close to an nosql database as it also does not support transaction and joins.

With framework like Hibernate-Search , it is possible to map even complex objects to the index and perform basic CRUD operations, while supporting full-text search.

Considerations:

1) Write throughput From my past experience, Lucene index's write throughput is much lower than RDBMS

2) Query Speed Query speed for Lucene index should be comparable, if not faster, due to the reverse index.

3) Scalability Could be resolved using replication or Solr-cloud .

4) Ability to handle large data set I have used lucene index with 15M+ document on a single JVM without any performance issue.

Background:

I am currently using MongoDB with Solr and it is working well enough. However, it is not as "simple" as i would like it to be due to:

  1. Keeping mongo and Solr index in sync (not a trivial task)
  2. Transformation between Java object <-> mongo <-> solr ( SpringData and SolrJ helps, but still not great).
  3. Why use two "persistence" technology if one will do

From the small scale test I have done so far, I haven't found any technical road block that would prevent me from using Solr/Lucene as persistence. However, I also don't want to commit to such a drastic refactoring without more information. I also aware of projects like Solandra with attempts to bring NoSQl and Solr together, but they don't seem to be mature enough.

Question

So with applications where full-text search is an major (but not the only) requirement, is it then feasible to for-go traditional (RDBMS) and contemporary (NoSQL) data store?

Great Reference Thanks to raticulin

Atlassian (Jira) - Lucene Generic Data Indexing

Lucene - Full Text Search/Information Retrieval Library. Solr - Enterprise Search Server built on top of Lucene.

Lucene/Solr should not be used in place of Persistence, neither they will be able to replace RDBMS nor it is a good thing to compare them to RDBMS, you are comparing apples & oranges.

  1. As far index throughput speed of Lucene that you are comparing with RDBMS will not help & it is not right to compare directly, there could be a number of factors that affect Lucene throughput depending on your search schema configurations.

  2. Lucene has one of the well known & best data structures for information retrieval, Query speed that you get depends on number of factors from configuration, HW etc..

  3. Obviously, that's the way to go.

  4. Handling 15M+ on a single JVM is great, but it does not go far without understanding Document size, feature set used, JVM Memory, CPU Cores etc...

Now if your problem is that RDBMS is real scalability bottleneck, you could use pick a NoSQL datastore based on your persistence needs, which you could then with integrate Solr/Lucene to provide full-text search capability. Since NoSQL is rapidly evolving & fairly new you might not find fairly stable adapters to integrate Solr/Lucene with NoSQL.

Edit:

Now that the question is updated, this is already well debated in this question NoSQL (MongoDB) vs Lucene (or Solr) as your database . It could be a pain to have too many moving parts, Lucene/Solr could very well replace MongoDB, depending on app. But you have to consider NoSQL Data Store are built from ground up to be fully distributed, you dont lose or have limited functionality due to scaling, while Solr is not built with Distributed Computing in mind, so there are limitations Distributed Search limitations when it comes horizontal scaling. SolrCloud may be the answer too that..

I think I remember watching some presentation from Atlassian where they explained that for Jira the were using just Lucene nowadays, they had dropped their previous DB (whatever it was) and using Lucene as storage too. They were happy.

If someone can confirm it was them would be cool.

Edit:

http://blogs.atlassian.com/rebelutionary/downloads/tssjs2007-lucene-generic-data-indexing.pdf

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM