I have to design a piece of software on a three layer architecture:
My data is simple and perfectly suitable for being stored in a single RDMS table, or I can store it in Cassandra, then periodically I would need Spark to run some machine learning algorithms on the whole set of data.
Which of the database better suits my use case? In detail, I do not need to scale on multiple nodes and I think the main underlying questions are:
Is simple querying (SELECT) faster on Cassandra or MySQL on a simple table?
Does the Spark Connector from Cassandra benefit of some features of it that will make it faster than a SQL connector?
You can use MySQL if data size is less than 2Tb. Select on MySQL table will be more flexible than in Cassandra. You should use Cassandra when your data storage requirement crosses single machine. Cassandra needs careful data modeling to be done for each lookup or Select scenario.
You can use suggested approach below for MySQL Spark Integration
It all depends on Data : size, integrity, scale, Flexible schema sharding etc.
Use MySQL if:
Use Cassandra if:
Have a look at this benchmarking article and this pdf
I think it's better to use a sql database as mysql, cassandra only should be used if you need to scale your data in bigger proportions and along many datacenters. The java cassandra jdbc driver is just a normal driver to connect to cassandra, it doesn't have any especial advantages over other database drivers.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.