简体繁体中英

Real time analytic using Apache Spark

原文 2015-03-31 09:23:49 9 1 java/ cassandra/ apache-spark/ bigdata/ cql3

I am using Apache Spark to analyse the data from Cassandra and will insert the data back into Cassandra by designing new tables in Cassandra as per our queries. I want to know that whether it is possible for spark to analyze in real time? If yes then how? I have read so many tutorials regarding this, but found nothing.

I want to perform the analysis and insert into Cassandra whenever a data comes into my table instantaneously.

1 answers

This is possible with Spark Streaming, you should take a look at the demos and documentation which comes packaged with the Spark Cassandra Connector.

https://github.com/datastax/spark-cassandra-connector

This includes support for streaming, as well as support for creating new tables on the fly.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/8_streaming.md

Spark Streaming extends the core API to allow high-throughput, fault-tolerant stream processing of live data streams. Data can be ingested from many sources such as Akka, Kafka, Flume, Twitter, ZeroMQ, TCP sockets, etc. Results can be stored in Cassandra.

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/5_saving.md#saving-rdds-as-new-tables

Use saveAsCassandraTable method to automatically create a new table with given name and save the RDD into it. The keyspace you're saving to must exist. The following code will create a new table words_new in keyspace test with columns word and count, where word becomes a primary key:

case class WordCount(word: String, count: Long) val collection = sc.parallelize(Seq(WordCount("dog", 50), WordCount("cow", 60))) collection.saveAsCassandraTable("test", "words_new", SomeColumns("word", "count"))

Using Apache Kafka for real time messaging

How to run analytic in spark?

Using Apache Spark for UDP

Join files using Apache Spark / Spark SQL

Using Java Socket vs. Apache HttpClient for asynchronous real-time HTTP data gathering

Apache spark error using Maven

using apache spark for temperature prediction

Mergesort using apache-spark

Using broadcast variables in apache spark

Apache Spark Time based Kafka off set

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using Apache Kafka for real time messaging How to run analytic in spark? Using Apache Spark for UDP Join files using Apache Spark / Spark SQL Using Java Socket vs. Apache HttpClient for asynchronous real-time HTTP data gathering Apache spark error using Maven using apache spark for temperature prediction Mergesort using apache-spark Using broadcast variables in apache spark Apache Spark Time based Kafka off set

Related Tags

Real time analytic using Apache Spark

Question

1 answers

solution1 1 ACCPTED 2015-03-31 17:17:20

solution1
1 ACCPTED 2015-03-31 17:17:20