简体   繁体   中英

How to save data from spark streaming to cassandra using java?

I get some entries from the stream in linux terminal, assign them as lines , break them into words . But instead of printing them out I want to save them to Cassandra. I have a Keyspace named ks , with a table inside it named record . I know that some code like CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra(); has to do the job but I guess I am doing something wrong. Can someone help ?

Here is my Cassandra ks.record schema (I added these data through CQLSH)

id | birth_date                       | name
----+---------------------------------+-----------
10 | 1987-12-01 23:00:00.000000+0000  | Catherine
11 | 2004-09-07 22:00:00.000000+0000  |   Isadora
1  | 2016-05-10 13:00:04.452000+0000  |      John
2  | 2016-05-10 13:00:04.452000+0000  |      Troy
12 | 1970-10-01 23:00:00.000000+0000  |      Anna
3  | 2016-05-10 13:00:04.452000+0000  |    Andrew

Here is my Java code :

import com.datastax.spark.connector.japi.CassandraStreamingJavaUtil;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;

import java.util.Arrays;

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;
import static com.datastax.spark.connector.japi.CassandraStreamingJavaUtil.*;


public class CassandraStreaming2 {
    public static void main(String[] args) {

        // Create a local StreamingContext with two working thread and batch interval of 1 second
        SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("CassandraStreaming");
        JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(1));

        // Create a DStream that will connect to hostname:port, like localhost:9999
        JavaReceiverInputDStream<String> lines = sc.socketTextStream("localhost", 9999);

        // Split each line into words
        JavaDStream<String> words = lines.flatMap(
                (FlatMapFunction<String, String>) x -> Arrays.asList(x.split(" "))
        );

        words.print();
        //CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();

        sc.start();              // Start the computation
        sc.awaitTermination();   // Wait for the computation to terminate

    }
}

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra

As per the docs, you need to also pass a RowWriter factory. The most common way to do this is to use the mapToRow(Class) api, this is the missing parameter described.

But you have an additional problem, your code doesn't yet specify the data in a way that can be written to C*. You have a JavaDStream of only String s. And a single String cannot be made into a Cassandra Row for your given schema.

Basically you are telling the connector

Write "hello" to CassandraTable (id, birthday, value)

Without telling it where the hello goes (what should the id be? what should the birthday be?)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM