简体   繁体   English

spark-streaming:如何将流数据输出到cassandra

[英]spark-streaming: how to output streaming data to cassandra

I am reading kafka streaming messages using spark-streaming. 我正在使用spark-streaming阅读kafka流媒体消息。 Now I want to set Cassandra as my output. 现在我想将Cassandra设置为输出。 I have created a table in cassandra "test_table" with columns "key:text primary key" and "value:text" I have mapped the data successfully into JavaDStream<Tuple2<String,String>> data like this: 我在cassandra“test_table”中创建了一个表,其中列为“key:text primary key”和“value:text”我已成功将数据映射到JavaDStream<Tuple2<String,String>> data如下所示:

JavaSparkContext sc = new JavaSparkContext("local[4]", "SparkStream",conf);
JavaStreamingContext jssc = new JavaStreamingContext(sc, new Duration(3000));

JavaPairReceiverInputDStream<String, String> messages = KafkaUtils.createStream(jssc, args[0], args[1], topicMap );
JavaDStream<Tuple2<String,String>> data = messages.map(new Function< Tuple2<String,String>, Tuple2<String,String> >() 
{
    public Tuple2<String,String> call(Tuple2<String, String> message)
    {
        return new Tuple2<String,String>( message._1(), message._2() );
    }
}
);  

Then I have created a List: 然后我创建了一个List:

List<TestTable> list = new ArrayList<TestTable>();

where TestTable is my custom class having the same structure as my Cassandra table, with members "key" and "value": 其中TestTable是我的自定义类,具有与我的Cassandra表相同的结构,其成员为“key”和“value”:

class TestTable
{
    String key;
    String val;

    public TestTable() {}

    public TestTable(String k, String v)
    {
        key=k;
        val=v;
    }

    public String getKey(){
        return key;
    }

    public void setKey(String k){
        key=k;
    }

    public String getVal(){
        return val;
    }

    public void setVal(String v){
        val=v;
    }

    public String toString(){
        return "Key:"+key+",Val:"+val;
    }
}

Please suggest a way how to I add the data from JavaDStream<Tuple2<String,String>> data into the List<TestTable> list . 请建议如何将JavaDStream<Tuple2<String,String>> dataList<TestTable> list I am doing this so that I can subsequently use 我这样做,以便我可以随后使用

JavaRDD<TestTable> rdd = sc.parallelize(list); 
javaFunctions(rdd, TestTable.class).saveToCassandra("testkeyspace", "test_table"); 

to save the RDD data into Cassandra. 将RDD数据保存到Cassandra中。

I had tried coding this way: 我试过这种方式编码:

messages.foreachRDD(new Function<Tuple2<String,String>, String>()
                        {
                            public List<TestTable> call(Tuple2<String,String> message)
                            {
                                String k = message._1();
                                String v = message._2();
                                TestTable tbl = new TestTable(k,v);
                                list.put(tbl);
                            }
                        }
                    );

but seems some type mis-match happenning. 但似乎有些类型的错配发生了。 Please help. 请帮忙。

Assuming that the intention of this program is to save the streaming data from kafka into Cassandra, it's not necessary to dump the JavaDStream<Tuple2<String,String>> data into a List<TestTable> list. 假设此程序的目的是将流数据从kafka保存到Cassandra,则不必将JavaDStream<Tuple2<String,String>>数据转储到List<TestTable>列表中。

The Spark-Cassandra connector by DataStax supports this functionality directly through the Spark Streaming extensions . DataStax的Spark-Cassandra连接器直接通过Spark Streaming扩展支持此功能。

It should be sufficient to use such extensions on the JavaDStream : JavaDStream上使用这样的扩展应该足够了:

javaFunctions(data).writerBuilder("testkeyspace", "test_table", mapToRow(TestTable.class)).saveToCassandra();

instead of draining data on an intermediary list. 而不是在中间列表上排放数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM