[英]How to save data from spark streaming to cassandra using java?
I get some entries from the stream in linux terminal, assign them as lines
, break them into words
. 我从linux终端中的流中获取一些条目,将它们分配为
lines
,将它们分解为words
。 But instead of printing them out I want to save them to Cassandra. 但是我不想将它们打印出来,而是将它们保存到Cassandra。 I have a Keyspace named
ks
, with a table inside it named record
. 我有一个名为
ks
,里面有一个名为record
的表。 I know that some code like CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();
我知道一些代码,例如
CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();
has to do the job but I guess I am doing something wrong. 必须要做的工作,但我想我做错了。 Can someone help ?
有人可以帮忙吗?
Here is my Cassandra ks.record schema (I added these data through CQLSH) 这是我的Cassandra ks.record模式(我通过CQLSH添加了这些数据)
id | birth_date | name
----+---------------------------------+-----------
10 | 1987-12-01 23:00:00.000000+0000 | Catherine
11 | 2004-09-07 22:00:00.000000+0000 | Isadora
1 | 2016-05-10 13:00:04.452000+0000 | John
2 | 2016-05-10 13:00:04.452000+0000 | Troy
12 | 1970-10-01 23:00:00.000000+0000 | Anna
3 | 2016-05-10 13:00:04.452000+0000 | Andrew
Here is my Java code : 这是我的Java代码:
import com.datastax.spark.connector.japi.CassandraStreamingJavaUtil;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;
import org.apache.spark.streaming.Durations;
import org.apache.spark.streaming.api.java.JavaDStream;
import org.apache.spark.streaming.api.java.JavaPairDStream;
import org.apache.spark.streaming.api.java.JavaReceiverInputDStream;
import org.apache.spark.streaming.api.java.JavaStreamingContext;
import scala.Tuple2;
import java.util.Arrays;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.javaFunctions;
import static com.datastax.spark.connector.japi.CassandraJavaUtil.mapToRow;
import static com.datastax.spark.connector.japi.CassandraStreamingJavaUtil.*;
public class CassandraStreaming2 {
public static void main(String[] args) {
// Create a local StreamingContext with two working thread and batch interval of 1 second
SparkConf conf = new SparkConf().setMaster("local[2]").setAppName("CassandraStreaming");
JavaStreamingContext sc = new JavaStreamingContext(conf, Durations.seconds(1));
// Create a DStream that will connect to hostname:port, like localhost:9999
JavaReceiverInputDStream<String> lines = sc.socketTextStream("localhost", 9999);
// Split each line into words
JavaDStream<String> words = lines.flatMap(
(FlatMapFunction<String, String>) x -> Arrays.asList(x.split(" "))
);
words.print();
//CassandraStreamingJavaUtil.javaFunctions(words).writerBuilder("ks", "record").saveToCassandra();
sc.start(); // Start the computation
sc.awaitTermination(); // Wait for the computation to terminate
}
}
https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra https://github.com/datastax/spark-cassandra-connector/blob/master/doc/7_java_api.md#saving-data-to-cassandra
As per the docs, you need to also pass a RowWriter factory. 根据文档,您还需要通过RowWriter工厂。 The most common way to do this is to use the
mapToRow(Class)
api, this is the missing parameter described. 最常见的方法是使用
mapToRow(Class)
api,这是描述的缺少参数。
But you have an additional problem, your code doesn't yet specify the data in a way that can be written to C*. 但是您还有另一个问题,您的代码尚未以可以写入C *的方式指定数据。 You have a JavaDStream of only
String
s. 您的JavaDStream只有
String
。 And a single String
cannot be made into a Cassandra Row for your given schema. 对于给定的架构,不能将单个
String
制成Cassandra行。
Basically you are telling the connector 基本上你是在告诉连接器
Write "hello" to CassandraTable (id, birthday, value)
Without telling it where the hello
goes (what should the id be? what should the birthday be?) 不告诉它
hello
去向(id应该是什么?生日应该是什么?)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.