简体   繁体   中英

Cassandra + Spark vs MySQL + Spark

I have to design a piece of software on a three layer architecture:

  • A process periodically polling a data source such an ftp to inject in a database
  • A database
  • Spark for the processing of the data

My data is simple and perfectly suitable for being stored in a single RDMS table, or I can store it in Cassandra, then periodically I would need Spark to run some machine learning algorithms on the whole set of data.

Which of the database better suits my use case? In detail, I do not need to scale on multiple nodes and I think the main underlying questions are:

  • Is simple querying (SELECT) faster on Cassandra or MySQL on a simple table?

  • Does the Spark Connector from Cassandra benefit of some features of it that will make it faster than a SQL connector?

You can use MySQL if data size is less than 2Tb. Select on MySQL table will be more flexible than in Cassandra. You should use Cassandra when your data storage requirement crosses single machine. Cassandra needs careful data modeling to be done for each lookup or Select scenario.

You can use suggested approach below for MySQL Spark Integration

How to work with MySQL and Apache Spark?

It all depends on Data : size, integrity, scale, Flexible schema sharding etc.

Use MySQL if:

  1. Data size is small ( in single digit TBs)
  2. Strong Consistency( Atomicity, Consistency, Isolation & Durability) is required

Use Cassandra if:

  1. Data size is huge and horizontal scalability is required
  2. Eventual Consistency ( Basically Available Soft-state Eventual consistency)
  3. Flexible schema
  4. Distributed application.

Have a look at this benchmarking article and this pdf

I think it's better to use a sql database as mysql, cassandra only should be used if you need to scale your data in bigger proportions and along many datacenters. The java cassandra jdbc driver is just a normal driver to connect to cassandra, it doesn't have any especial advantages over other database drivers.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM