简体   繁体   中英

Using Neo4j and Lucene in a distributed system

I am looking into Neo4j as a stripped-down document store. A key aspect of document storage is search, and I know Neo4j includes full text search via legacy indices provided by Lucene.

I would be very interested in hearing the limitations of Neo4j search capabilities in a distributed environment. Does it provide a distributed index? In what ways is it inferior to Solr or ElasticSearch? How far can I take it before I must install Solr?

-- EDIT --

We are trying to integrate two distinct search efforts. The first is standard text content search. For instance, using the Enron emails, we want to search for every email that matches "bananas" or "going to the store" and get those document bodies in response. This is where people often turn to Solr.

The second case is more complicated, we have attached a great deal of meta-data to each document. We may have decided that "these" emails were the result of late-night drunk-dialing. Now I want to search for all emails that may have been the result of late-night drunk-dialing. For this kind of meta-data, we believe a graph database is in order.

In a perfect world, I can use one platform to perform both queries. I appreciate that Neo4j (nor OrientDB, Arango, etc) are designed as full text search databases, but I'm trying to understand the limitations thereof.

In terms of volume, we are dealing at a very large scale with batch-style nightly updates. The data is content heavy, with some documents running into hundreds of pages of text, but mostly on the order of a page or two.

Neo4j can provide a "distributed index" in the sense that the high availability cluster can make your index available on more than one machine, but I'm pretty sure that's not what you're after. Related to this issue is a different answer I wrote about graph partitioning, and what it takes to distribute a really large number of nodes/relationships across multiple machines. (It's not terribly simple)

Solr and Lucene do two different things (although Solr is built on top of Lucene). I think solr and neo4j are not comparable because they're trying to do completely different things. This site isn't about software recommendations so I can't tell you what you should use other than to say you should read up on solr and neo4j, and figure out which set of functionality you want. As far as I know, this is an exclusive decision as I'm not aware of people integrating solr with neo4j.

Your question is very difficult to answer, I'd recommend expanding on what you are trying to do and what you have tried, you'll probably get better responses.

我曾经在一个健康社交网络上工作,我们需要某种搜索和连接搜索功能,我们首先在neo4j上进行了开发,我们对可以获取并表达任何请求的密码查询语言印象深刻,但是当您将数十亿个节点扔到那里时,您就会启动为了付出代价,我们开始考虑另一个图形数据库,这次我们进行了大量的研究,测试,而OrientDB显然是赢家,OrientDB具有高度的可扩展性,但问题是您必须自己编写代码,算法”,如果您想做一些高级的事情(这两个节点之间的共同点是什么),否则您将拥有类似SQL的查询语言(我不知道/记得他是否有名字),但是您可以做一些有趣的事情因此,总的来说,我肯定会继续使用OrientDB

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM