简体   繁体   English

如何使用Java将数据从Spark Streaming保存到Cassandra?

[英]How to save data from spark streaming to cassandra using java?

I get some entries from the stream in linux terminal, assign them as lines , break them into words . 我从linux终端中的流中获取一些条目,将它们分配为lines ,将它们分解为words But instead of printing them out I want to save them to Cassandra. 但是我不想将它们打印出来,而是将它们保存到Cassandra。 I have a Keyspace named ks , with a table inside it named record . 我有一个名为ks ,里面有一个名为record的表。 I know that some code like CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra(); 我知道一些代码,例如CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra(); has to do the job but I guess I am doing something wrong. 必须要做的工作,但我想我做错了。 Can someone help ? 有人可以帮忙吗?

Here is my Cassandra ks.record schema (I added these data through CQLSH) 这是我的Cassandra ks.record模式(我通过CQLSH添加了这些数据)

id | birth_date                       | name
----+---------------------------------+-----------
10 | 1987-12-01 23:00:00.000000+0000  | Catherine
11 | 2004-09-07 22:00:00.000000+0000  |   Isadora
1  | 2016-05-10 13:00:04.452000+0000  |      John
2  | 2016-05-10 13:00:04.452000+0000  |      Troy
12 | 1970-10-01 23:00:00.000000+0000  |      Anna
3  | 2016-05-10 13:00:04.452000+0000  |    Andrew

Here is my Java code : 这是我的Java代码:

import com.datastax.spark.connector.japi.CassandraStreamingJavaUtil;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;

import java.util.Arrays;

import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;
import static com.datastax.spark.connector.japi.CassandraStreamingJavaUtil.*;


public class CassandraStreaming2 {
    public static void main(String[] args) {

        // Create a local StreamingContext with two working thread and batch interval of 1 second
        SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("CassandraStreaming");
        JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(1));

        // Create a DStream that will connect to hostname:port, like localhost:9999
        JavaReceiverInputDStream<String> lines = sc.socketTextStream("localhost", 9999);

        // Split each line into words
        JavaDStream<String> words = lines.flatMap(
                (FlatMapFunction<String, String>) x -> Arrays.asList(x.split(" "))
        );

        words.print();
        //CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();

        sc.start();              // Start the computation
        sc.awaitTermination();   // Wait for the computation to terminate

    }
}

https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra

As per the docs, you need to also pass a RowWriter factory. 根据文档,您还需要通过RowWriter工厂。 The most common way to do this is to use the mapToRow(Class) api, this is the missing parameter described. 最常见的方法是使用mapToRow(Class) api,这是描述的缺少参数。

But you have an additional problem, your code doesn't yet specify the data in a way that can be written to C*. 但是您还有另一个问题,您的代码尚未以可以写入C *的方式指定数据。 You have a JavaDStream of only String s. 您的JavaDStream只有String And a single String cannot be made into a Cassandra Row for your given schema. 对于给定的架构,不能将单个String制成Cassandra行。

Basically you are telling the connector 基本上你是在告诉连接器

Write "hello" to CassandraTable (id, birthday, value)

Without telling it where the hello goes (what should the id be? what should the birthday be?) 不告诉它hello去向(id应该是什么?生日应该是什么?)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM