简体   繁体   中英

Real time analytic using Apache Spark

I am using Apache Spark to analyse the data from Cassandra and will insert the data back into Cassandra by designing new tables in Cassandra as per our queries. I want to know that whether it is possible for spark to analyze in real time? If yes then how? I have read so many tutorials regarding this, but found nothing.

I want to perform the analysis and insert into Cassandra whenever a data comes into my table instantaneously.

This is possible with Spark Streaming, you should take a look at the demos and documentation which comes packaged with the Spark Cassandra Connector.

https://github.com/datastax/spark-cassandra-connector

This includes support for streaming, as well as support for creating new tables on the fly.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md

Spark Streaming extends the core API to allow high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources such as Akka, Kafka, Flume, Twitter, ZeroMQ, TCP sockets, etc. Results can be stored in Cassandra.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#saving-rdds-as-new-tables

Use saveAsCassandraTable method to automatically create a new table with given name and save the RDD into it. The keyspace you're saving to must exist. The following code will create a new table words_new in keyspace test with columns word and count, where word becomes a primary key:

case class WordCount(word: String, count: Long) val collection = sc.parallelize(Seq(WordCount("dog", 50), WordCount("cow", 60))) collection.saveAsCassandraTable("test", "words_new", SomeColumns("word", "count"))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM