简体   繁体   中英

how to handle large dataset using neo4j and gremlin?

I have around 88 millions nodes and 200 millions edges. I am using Neo4j Db. I am using Batch Graph using Gremlin. So, is it advisable to use gremlin queries for this dataset using gremlin REPL. I mean avoid timeout or heap related issues.

Currently our scope is not to use faunus api for hadoop map.reduce sructure.

Can I handle this using simple Neo4j Db with gremlin ? Any alternative or solution ?

I think Marko/Peter both gave good answers to this on the gremlin-users mailing list:

https://groups.google.com/forum/#!topic/gremlin-users/w3xM4YJTA2I

I'm not sure I'm saying much more than they said, but I'll just repeat a bit in my own words. The answer largely depends on the nature of what you intend to do with your graph and the structure of the graph itself. If your workload is a lot of local traversals (ie start at some vertex and traverse out from there) and don't expect a lot of supernodes then Gremlin and Neo4j should do just fine. Give it a lot of memory, do a bit of neo4j specific tuning and you should be quite pleased. If on the other hand your traversals are more global in nature (ie they start with gV or gE) where you have to touch the entire graph to do your calculation then you will be less pleased. It takes a long time to iterate tens/hundreds of millions of things.

Ultimately you have to understand the problem you are facing, your use cases, your graph structure and the strengths/weaknesses of the graph databases available to decide how you will approach a graph of that size.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM