[英]UPDATE Cassandra table using spark cassandra connector
I'm facing an issue with spark cassandra connector on scala while updating a table in my keyspace 我在更新键空间中的表时遇到了scala上的spark cassandra连接器问题
Here is my piece of code 这是我的一段代码
val query = "UPDATE " + COLUMN_FAMILY_UNIQUE_TRAFFIC + DATA_SET_DEVICE +
" SET a= a + " + b + " WHERE x=" +
x + " AND y=" + y +
" AND z=" + x
println(query)
val KeySpace = new CassandraSQLContext(sparkContext)
KeySpace.setKeyspace(KEYSPACE)
hourUniqueKeySpace.sql(query)
When I execute this code, I'm getting an error like this 当我执行此代码时,我收到这样的错误
Exception in thread "main" java.lang.RuntimeException: [1.1] failure: ``insert'' expected but identifier UPDATE found
Any idea why this is happening? 知道为什么会这样吗? How can I fix this? 我怎样才能解决这个问题?
The UPDATE of a table with counter column is feasible via the spark-cassandra-connector. 通过spark-cassandra-connector可以更新带有计数器列的表。 You will have to use DataFrames and DataFrameWriter method save with mode "append" (or SaveMode .Append if you prefer). 你将不得不使用DataFrames和DataFrameWriter方法保存与模式“追加”(或SaveMode .Append如果您愿意)。 Check the code DataFrameWriter.scala . 检查代码DataFrameWriter.scala 。
For example, given a table: 例如,给出一个表:
cqlsh:test> SELECT * FROM name_counter ;
name | surname | count
---------+---------+-------
John | Smith | 100
Zhang | Wei | 1000
Angelos | Papas | 10
The code should look like this: 代码应该如下所示:
val updateRdd = sc.parallelize(Seq(Row("John", "Smith", 1L),
Row("Zhang", "Wei", 2L),
Row("Angelos", "Papas", 3L)))
val tblStruct = new StructType(
Array(StructField("name", StringType, nullable = false),
StructField("surname", StringType, nullable = false),
StructField("count", LongType, nullable = false)))
val updateDf = sqlContext.createDataFrame(updateRdd, tblStruct)
updateDf.write.format("org.apache.spark.sql.cassandra")
.options(Map("keyspace" -> "test", "table" -> "name_counter"))
.mode("append")
.save()
After UPDATE: 更新后:
name | surname | count
---------+---------+-------
John | Smith | 101
Zhang | Wei | 1002
Angelos | Papas | 13
The DataFrame conversion can be simpler by implicitly convert an RDD to a DataFrame : import sqlContext.implicits._
and using .toDF()
. 通过将RDD隐式转换为DataFrame,可以更简单地进行DataFrame转换: import sqlContext.implicits._
并使用.toDF()
。
Check the full code for this toy application: https://github.com/kyrsideris/SparkUpdateCassandra/tree/master 检查此玩具应用程序的完整代码: https : //github.com/kyrsideris/SparkUpdateCassandra/tree/master
Since versions are very important here, the above apply to Scala 2.11.7, Spark 1.5.1, spark-cassandra-connector 1.5.0-RC1-s_2.11, Cassandra 3.0.5. 由于版本在这里非常重要,以上内容适用于Scala 2.11.7,Spark 1.5.1,spark-cassandra-connector 1.5.0-RC1-s_2.11,Cassandra 3.0.5。 DataFrameWriter is designated as @Experimental
since @since 1.4.0
. 自@since 1.4.0
以来, @since 1.4.0
被指定为@Experimental
。
I believe that you cannot update natively through the SPARK connector. 我相信您无法通过SPARK连接器本机更新。 See the documention : 请参阅文档 :
"The default behavior of the Spark Cassandra Connector is to overwrite collections when inserted into a cassandra table. To override this behavior you can specify a custom mapper with instructions on how you would like the collection to be treated." “Spark Cassandra Connector的默认行为是在插入cassandra表时覆盖集合。要覆盖此行为,您可以指定一个自定义映射器,其中包含有关如何处理集合的说明。”
So you'll want to actually INSERT a new record with an existing key. 因此,您希望实际使用现有密钥插入新记录。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.