简体繁体中英

Use spark RDD as a source of data in a REST API

原文 2017-01-19 07:30:24 7 1 api/ apache-spark/ graph/ graph-databases

There is a graph that computes on Spark and stores to Cassandra.
Also there is a REST API which has endpoint to get graph node with edges and edges of edges.
This second degree graph may include up to 70000 nodes.
Currently uses Cassandra as the database, but to extract a lot of data by key from Cassandra takes much time and resources.
We tried TitanDB, Neo4j and OriendDB to improve performance but Cassandra showed the best results.

Now there is another idea. Persist RDD (or may be GrapgX object) in the API service and on API call filter necessary data from persisted RDD.
I guess that it will work fast while RDD fits in memory, but in the case that it caches to disk it will work like a full scan (eg full scan parquet file). Also I expect that we will face to these issues:

memory leak in spark;
updating this RDD (unpersist previous, read new and persist new one) will require stop API;
concurrent using this RDD will require manually manage CPU resources.

Do anybody have such experience?

1 answers

Spark is NOT a storage engine. Unless you will process big amount of data each time, you should consider:

In-memory data grids - Hazelcast, Apache Ignite, Coherence, GigaSpaces, etc.
Cassandra in-memory - https://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/inMemory.html
search for "in-memory" option in other framework/database

Spark REST API: Failed to find data source: com.databricks.spark.csv

Birt report Rest api Data source

REST vs Database as GraphQL API data source?

Trello rest api as a data source for Power Bi - authentication issue

Getting data sets from a raw data source - Google Fit REST API

Use ReportingService API to read custom data source credentials

Use REST API in PHP

How to use nested loops for fetching the data from a rest api in flutter?

How to use REST api?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spark REST API: Failed to find data source: com.databricks.spark.csv Birt report Rest api Data source REST vs Database as GraphQL API data source? Trello rest api as a data source for Power Bi - authentication issue Getting data sets from a raw data source - Google Fit REST API Use ReportingService API to read custom data source credentials Use REST API in PHP How to use nested loops for fetching the data from a rest api in flutter? Why its recommended to use GET method for retrieve data in Rest API? How to use REST api?

Related Tags

Use spark RDD as a source of data in a REST API

Question

1 answers

solution1 0 2017-01-19 08:44:19

solution1
0 2017-01-19 08:44:19