There is a graph that computes on Spark and stores to Cassandra.
Also there is a REST API which has endpoint to get graph node with edges and edges of edges.
This second degree graph may include up to 70000 nodes.
Currently uses Cassandra as the database, but to extract a lot of data by key from Cassandra takes much time and resources.
We tried TitanDB, Neo4j and OriendDB to improve performance but Cassandra showed the best results.
Now there is another idea. Persist RDD (or may be GrapgX object) in the API service and on API call filter necessary data from persisted RDD.
I guess that it will work fast while RDD fits in memory, but in the case that it caches to disk it will work like a full scan (eg full scan parquet file). Also I expect that we will face to these issues:
Do anybody have such experience?
Spark is NOT a storage engine. Unless you will process big amount of data each time, you should consider:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.