简体   繁体   中英

Using Spark to save data to Cassandra

Now in my current architecture I have a module which is responsible for writing/reading data to and from Cassandra, and module responsible for downloading data. Recently I started using Datastax and Spark. I want to do some transformations on new acquired data. What's the right take on this problem? Do I use my module for storing data and do Spark calculations separately, or send downloaded data directly to Spark using Spark Streaming and in a job save both the orginal data and transformed data to Cassandra? I'm operating on stock quotes so it's a lot of data downloaded continuously and a lot of transformations.

In my opinion, its better to keep it separated.

first store the raw data then process it.
its easier to scale and maintain each component later.

for example: if you want to change something in your downloading module like adding a new download sources or fix a bug, it wont affect the data processing done in spark, and changing something in the code running on spark wont have any effect(or introduce a bug) on the raw data you downloaded.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM