[英]How to write to Cassandra using foreachBatch() in Java Spark?
I have the following code and i would like to write into cassandra using spark 2.4 structured streaming foreachBatch我有以下代码,我想使用 spark 2.4 结构化流foreachBatch写入 cassandra
Dataset<Row> df = spark .readStream() .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("subscribe", "topic1") .load(); Dataset<Row> values=df.selectExpr( "split(value,',')[0] as field1", "split(value,',')[1] as field2", "split(value,',')[2] as field3", "split(value,',')[3] as field4", "split(value,',')[4] as field5"); //TODO write into cassandra values.writeStream().foreachBatch( new VoidFunction2<Dataset<String>, Long> { public void call(Dataset<String> dataset, Long batchId) { // Transform and write batchDF } ).start();
When you use .forEachBatch
, your code is just working as with normal datasets... In Java the code could look like as following (full source is here ):当您使用
.forEachBatch
,您的代码就像使用普通数据集一样工作......在 Java 中,代码可能如下所示(完整源代码在这里):
.foreachBatch((VoidFunction2<Dataset<Row>, Long>) (df, batchId) ->
df.write()
.format("org.apache.spark.sql.cassandra")
.options(ImmutableMap.of("table", "sttest", "keyspace", "test"))
.mode(SaveMode.Append)
.save()
)
Update in September 2020th: support for spark structured streaming was added in the Spark Cassandra Connector 2.5.0 2020 年 9 月更新: Spark Cassandra Connector 2.5.0 中添加了对Spark结构化流的支持
Try add it to your pom.xml:尝试将其添加到您的 pom.xml 中:
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.4.2</version>
</dependency>
after that import cassandra implicits:在那之后导入 cassandra 隐含:
import org.apache.spark.sql.cassandra._
than you can use cassandraFormat method on your df:比您可以在 df 上使用 cassandraFormat 方法:
dataset
.write
.cassandraFormat("table","keyspace")
.save()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.